home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-09-04 | 181.2 KB | 3,806 lines |
- CHAPTER 6
-
- FILE AND DEVICE HANDLING
-
-
- At some point, all but the most trivial computer programs will need to
- store and retrieve data using a disk file. Data files are used for two
- primary purposes: to hold information when there is more than can fit into
- the computer's memory all at once, and to provide a permanent, non-volatile
- means of storage. Files are also used to allow data from one computer to
- be used on another. Such data sharing can be as simple as a "sneaker net"
- system, whereby a floppy disk is manually carried from one PC to another,
- or as complex as a multi-user network where disk data can be accessed
- simultaneously by several users.
- Although there are two fundamentally different types of disk drives,
- floppy and fixed [not counting CD-ROMs drives which are removable], they
- are accessed identically using the same BASIC statements. BASIC's file
- commands may also be used to communicate with devices such as a printer or
- modem, and even the screen and keyboard. There are many ways to manipulate
- files and devices, and some are substantially faster than others. By
- understanding fully how BASIC interacts with DOS, file access in your
- programs can often be speeded up by a factor of five or even more.
- In this chapter I will address the fundamental aspects of file and
- device handling, and provide specific examples of how to achieve the
- highest performance possible. I will begin with an overview of how DOS
- organizes information on a disk, and then continue with practical examples.
- Unlike earlier chapters in which only short program fragments were shown,
- several complete programs and subprograms will be presented to illustrate
- the most important of these techniques in context. I will also describe
- the underlying theory of how disks are organized, and explain why this is
- important for the BASIC programmer to know.
- In Chapter 7 the subject of files will be continued; there you will
- learn how to write programs for use with a network, and also how relational
- databases are constructed. In particular, coverage of these two very
- important subjects is severely lacking in the documentation that comes with
- Microsoft BASIC. As personal computers continue to permeate the office
- environment, networks and databases are becoming ever more common. Many
- programmers find themselves in the awkward position of having to write
- programs that run on a network, but with no adequate source of information.
-
-
- DISK FILE FUNDAMENTALS
- ======================
-
- All disks used with MS-DOS are organized into groups of bytes called
- *sectors*, and these sectors are further combined into *clusters*. DOS
- keeps track of every file on a disk, but with this organization DOS needs
- to remember only the cluster number at which each file begins. The minimum
- amount of disk space that is allocated by DOS is one cluster. Therefore,
- if you create a very small file--say, ten bytes--an entire cluster is
- allocated to that file, and then marked as unavailable for other use.
- In most cases, each disk sector holds 512 bytes; however, one
- exception is when you use a RAM disk to simulate a disk drive in memory.
- Many RAM disk programs lets you specify a smaller sector size, to minimize
- waste when there are many small files. The number of sectors that are
- stored in each cluster depends on the type of disk and its size. For
- example, a 360K floppy disk stores two sectors in each cluster, and a 32 MB
- hard disk formatted using DOS 3.3 stores four sectors in each cluster.
- Therefore, the minimum unit of storage allocation for these disks is 1K
- (1024 bytes), and 2K (2048 bytes) respectively. DOS 2.x offers less room
- to store cluster numbers, and must combine more sectors into each cluster.
- A 20MB hard disk formatted with DOS 2.1 allocates 8K for even a one-line
- batch file!
- As files are created and appended, DOS allocates new space to hold the
- file contents. By allocating disk space in units, DOS is also able to
- minimize disk fragmentation. As you learned in Chapter 2, BASIC manages
- variable-length strings by claiming new memory as necessary. When
- available memory is exhausted BASIC compacts its string space, overwriting
- abandoned string data with strings that are still active.
- This method is not practical with disk files, because copying data
- from one part of the disk to another for the purpose of compaction would
- take an unacceptable amount of time. Therefore, DOS initially allocates an
- entire cluster for each file, to provide space for subsequent data. When
- the ten-byte file mentioned earlier is added to, space on the disk has
- already been set aside for all or part of the new data that will be
- written. And when the first cluster's capacity is exceeded, DOS allocates
- an entire second cluster to hold the additional data.
- Even though it is common for a disk to become fragmented, allocating
- clusters that are comprised of groups of contiguous sectors greatly reduces
- the number of individual fragments that must be accessed. The track,
- sector, and cluster makeup of a 360k 5-1/4 inch floppy disk is shown in
- Figure 6-1.
-
-
- Figure 6.1: Sector and cluster organization for a 360k floppy disk.
- [Sorry, this figure is not available.]
-
-
- This disk is divided into 40 circular tracks, and each track is further
- divided into nine sectors. One track holds 512 bytes, and each pair of
- tracks is combined to form a single cluster. For a 360k disk, no file
- fragment will ever be smaller than two clusters, since this is the minimum
- amount of space that DOS allocates. Likewise, a hard disk that combines
- four sectors into each cluster will never be divided into pieces smaller
- than four sectors.
- Please understand that tracks and sectors are physical entities that
- are magnetically encoded onto the disk when it is formatted--it is DOS that
- treats each pair of sectors as a single cluster. Note that since a 360k
- disk stores nine sectors on each track, some clusters will in fact span two
- tracks.
- Using the disk in Figure 6-1 as an example, the first short file that
- is written to it will be placed in cluster 1 (sectors 1 and 2), even if the
- file does not fill both sectors. The second file written to this disk will
- then be stored starting at cluster 2 (sectors 3 and 4). If the first file
- is later extended beyond the 1,024 bytes that can fit into cluster 1, the
- excess will be added beginning at cluster 3 (sectors 5 and 6). Thus, when
- DOS reads the first file sequentially, it must read cluster 1, skip over
- cluster 2, and then continue reading at cluster 3.
- Of course, this takes longer than reading a file that is contiguous,
- because the disk drive must wait until the second file's intervening
- sectors have passed beneath it. This problem is compounded by additional
- head movement when the fragmentation extends across more than one track, as
- well as by other timing issues.
- There are also three special areas on every disk: the boot sector, the
- Disk Directory and the File Allocation Table (FAT). DOS uses the directory
- and FAT to know the name of each file, and where on the disk its first
- cluster is located. For simplicity, these are not shown in Figure 6-1, and
- indeed, they are in fact stored before any files on a disk.
- When a 360K floppy disk is formatted, DOS sets aside room for 112
- directory entries. Each entry is 32 bytes long, and holds the name of each
- file on the disk, its current size, the date and time it was last written
- to, its attribute (hidden, read-only, and so forth), and starting cluster
- number. When you open a file, DOS searches each directory entry for the
- file name you specified, and once found, goes to the first cluster that
- holds the file's data.
- The disk's FAT contains one entry for every cluster in the data area,
- to show which clusters are in use and by which file. The FAT is organized
- as a linked list, with each entry pointing to the next. The last cluster
- in the file is identified with a special value. The FAT also holds other
- special values to identify unused, reserved, and defective clusters.
- Because there are a fixed number of directory entries on a disk, it is
- possible to receive a "Disk full" message when attempting to open a new
- file, even when there is sufficient data space. The root directory of a
- 360K floppy disk is limited to 112 entries, and a 1.2MB disk can hold up to
- 224 file names. Notice that a volume label takes one directory entry,
- although no data space is allocated to it. Unlike the root directory on a
- disk, subdirectories that you create are not limited to an arbitrary number
- of file name entries. Rather, a subdirectory *is* in fact a file, and it
- can be extended indefinitely until there is no more room on the disk.
- Fortunately, most programmers do not have to deal with disk access at
- this level. When you ask BASIC to open a file and then read from or write
- to it, DOS handles all the low-level details for you. However, I think it
- is important to have at least a rudimentary understanding of how disks are
- organized. If you are interested in learning more about the structure of
- disks and data files, I recommend Peter Norton's *Programmer's Guide to the
- IBM PC & PS/2*. This excellent reference is published by Microsoft Press,
- and can be found at most major book stores.
-
-
- DISK-LIKE DEVICES
- =================
-
- A device is related to a file in that you can open it using BASIC's OPEN
- command, and then access it with GET # and PRINT # and the other file-
- related BASIC statements. There are a number of devices commonly used with
- personal computers, and these include printers, modems, tape backup units,
- and the console (the PC's keyboard and display screen). Some of these
- devices are maintained by DOS, and others are also controlled by BASIC.
- For example, when you open "SCRN:" for Output mode in a BASIC program,
- BASIC takes responsibility for displaying the characters that you print.
- However, if you instead open "CON", BASIC merely sends the data to DOS,
- which in turn sends it to the display screen. Any device whose name is
- followed by a colon is considered a to be BASIC device; the absence of a
- trailing colon indicates a DOS device. This is important to understand,
- because there may be situations when you want to route your program's
- output directly through DOS, and not have it be intercepted by BASIC.
- One such situation would be when printing the special control
- characters that the ANSI.SYS device driver recognizes. Normally, BASIC
- processes data in a PRINT statement by writing directly to screen memory.
- This provides the fastest response, which is of course desirable in most
- programs. But ANSI.SYS operates by intercepting the stream of characters
- sent through DOS. Since BASIC normally bypasses DOS for screen operations,
- ANSI.SYS never gets a chance to see those characters.
- Another reason for printing through DOS is to activate TSR (Terminate
- and Stay Resident) programs that intercept the BIOS video routines. (When
- data is sent through DOS for display, DOS merely passes it on to the BIOS
- routines which do the real work.) For example, some early screen design
- utilities use this method, to accommodate multiple programming languages by
- avoiding the differences in calling and linking. Therefore, to activate,
- say, a pop-up help screen, you are required to print a special control
- string. One such utility uses two CHR$(255) bytes followed by the name of
- the screen to be displayed.
- Although this method is very clumsy when compared to newer products
- that provide BASIC-linkable object files, it is simpler for the vendor than
- providing different objects for each supported language. This also allows
- screens to be displayed from within a batch file using the ECHO command.
- Therefore, if you need to send data through DOS or the BIOS for whatever
- reason, you would open and print to the "CON" device, instead of using
- normal PRINT statements or printing to the "SCRN:" device.
- One final point worth mentioning is the value of using the same syntax
- for both files and devices. Many programs let the user specify where a
- report is to be sent--either to a disk file, a printer, or the screen.
- Rather than duplicate similar code three times in a program, you can simply
- assign a string variable to the appropriate device or file name. This is
- shown in the listing below.
-
-
- PRINT "Printer, Screen, or File? (P/S/F): ";
-
- DO
- Choice$ = UCASE$(INKEY$)
- LOOP UNTIL INSTR(" PSF", Choice$) > 1
-
- IF Choice$ = "P" THEN
- Report$ = "LPT1:"
- ELSEIF Choice$ = "S" THEN
- Report$ = "SCRN:"
- ELSE
- PRINT
- LINE INPUT "Enter a file name: ", Report$
- END IF
-
- OPEN Report$ FOR OUTPUT AS #1
- PRINT #1, Header$
- PRINT #1, SomeStuff$
- PRINT #1, MoreStuff$
- ...
- ...
- CLOSE #1
- END
-
-
- Here, the same block of code can be used regardless of where the report is
- to be sent. The only alternative is to duplicate similar code three times
- using PRINT statements if the screen was specified, LPRINT if they want the
- printer, or PRINT # if the report is being sent to a file. Of course, this
- example could be further expanded to prompt for a printer number (1, 2, or
- 3) if a printer is specified.
-
-
- EXPLORING DATA FILES
- ====================
-
- All data is stored on disk as a continuous stream of binary information,
- regardless of how the file was opened. Even though BASIC and other
- languages offer a number of different file access methods, all disk files
- merely contain a series of individual bytes. When you open a file for
- random access, you are telling BASIC that it is to treat those bytes in a
- particular manner. In this case, the file is comprised of one or more
- fixed-length records. Thus, BASIC can perform many of the low level
- details that help you to organize and maintain that data.
- Likewise, opening a file for INPUT tells BASIC that you plan to read
- variable-length string data. Rather than reading or writing a single block
- of a given length, BASIC instead knows to continue to read bytes from the
- file until a terminating comma or carriage return is encountered. However,
- in both of these cases the disk file is still comprised of a series of
- bytes, and the access method you specify merely tells BASIC how it is to
- treat those bytes.
- The short program below illustrates this in context, and you can
- verify that all three files are identical using the DOS COMP utility
- program.
-
-
- OPEN "File1" FOR OUTPUT AS #1
- PRINT #1, "Testing"; SPC(13);
- CLOSE
-
- OPEN "File2" FOR BINARY AS #1
- Work$ = "Testing" + SPACE$(13)
- PUT #1, , Work$
- CLOSE
-
- OPEN "File3" FOR RANDOM AS #1 LEN = 20
- FIELD #1, 20 AS Temp$
- LSET Temp$ = "Testing"
- PUT #1
- CLOSE
- END
-
-
- In fact, even executable program files are indistinguishable from data
- files, other than by their file name extension. Again, it is how you
- choose to view the file contents that determines the actual form of the
- data.
-
-
- FILE BUFFERS
-
- Before I explain the various file access methods that BASIC provides, there
- is one additional low-level detail that needs to be addressed: file
- buffers. A file buffer is a portion of memory that holds data on its way
- to and from a disk file, and it is used to speed up file reads and writes.
- As you undoubtedly know, accessing a disk drive is one of the slowest
- operations that occurs on a PC. Because disk drives are mechanical, data
- being read or written requires a motor that spins the actual disk, as well
- as a mechanism to move the drive head to the appropriate location on the
- disk surface. Even if a file is located in contiguous disk clusters, a
- substantial amount of mechanical activity is required during the course of
- accessing a large file.
- When you open a file for reading, DOS uses a section of memory that it
- allocated on bootup as a disk buffer. The first time the file is accessed,
- DOS reads an entire sector into memory, even if your program requests only
- a few bytes. This way, when your program makes a subsequent read request,
- DOS can retrieve that data from memory instead of from the disk. This
- provides an enormous performance boost, since memory can be accessed many
- times faster than any mechanical disk drive. Even if the next portion of
- data being read is located in the same sector, the disk drive must wait for
- the disk to spin until that sector arrives at the magnetic read/write head.
- When using a floppy disk the time delays are even worse. Once a
- second or two have passed after accessing a floppy disk, the motor is
- turned off automatically. Having to then restart it again imposes yet
- another one or two second delay.
- Similarly, when you write data to a file DOS simply stores the data in
- the buffer, instead of writing it to the disk. When the buffer becomes
- full (or when you close the file--whichever comes first), DOS writes the
- entire buffer contents to the disk all at once. Again, this is many times
- faster than accessing the physical drive every time data is written.
- You can control the amount of memory that DOS sets aside for its
- buffers with a BUFFERS= statement in the PC's CONFIG.SYS file. For each
- buffer you specify, 512 bytes of memory is taken and made unavailable for
- other uses. Even though you might think that more buffers will always be
- faster than fewer, this is not necessarily the case. For each buffer, DOS
- also maintains a table that shows which disk sectors the buffer currently
- holds. At some point it can actually take longer for DOS to search through
- this table than to read the sector from disk. Of course, this time depends
- on the type of disk (floppy or hard), and the disk's access speed.
- Although DOS' use of disk buffers greatly improves file access speed,
- there is still room for improvement. Each call to DOS to read or write a
- file takes a finite amount of time, because most DOS services are handled
- by the same interrupt service routine. Which particular service a program
- wants is specified in one of the processor's registers, and determining
- which of the many possible services has been requested takes time.
- To further improve disk access performance, BASIC performs additional
- file buffering using its own routines. Since BASIC's buffers are usually
- located in near memory, they can also be accessed very quickly, because
- additional steps are needed to access data outside of DGROUP. However,
- BASIC PDS [and VB/DOS] store file buffers in the same segment used for
- string variables, so there is slightly less improvement when far strings
- are being used. When you open a random access file, a block of memory
- large enough to hold one entire record is set aside in string memory. If a
- record length is given as part of the OPEN command with LEN =, BASIC uses
- that for the buffer size. Otherwise, it uses the default size of 128
- bytes.
- When you open a file for sequential access, BASIC also allocates
- string memory for a buffer. 512 bytes are used by default, though you can
- override that with the optional LEN = argument. Specifying a buffer size
- with non-random files will be discussed later in this chapter.
- Note that BASIC PDS does not create a buffer when a file is opened for
- random access and you are using far strings. If a subsequent FIELD
- statement is then used, the fielded strings themselves comprise the buffer.
- Otherwise, BASIC assumes you will be reading the data into a TYPE variable,
- and avoids the extra buffering altogether. Also, file buffers in a BASIC
- PDS program are always stored in string memory, which is not necessarily
- DGROUP. If you are in the QBX environment or have compiled with the /fs
- far strings option, all file buffers will be stored in the far string data
- segment.
- Although BASIC's additional file buffering does improve your program's
- speed, it also comes at a cost: the buffers take away from string memory,
- and the only way to release their memory is to flush their contents to disk
- by closing the file. DOS offers a service to purge a file's buffers, to
- ensure that the data will be intact even if the program is terminated
- abnormally or the power is turned off. Therefore, it is considered good
- practice to periodically close a file during long data entry sessions. But
- closing the file and then reopening it after writing each record takes a
- long time, and more than negates any advantage offered by BASIC's added
- buffering. [Also, the DOS service that flushes a file's buffers does *not*
- flush BASIC's buffers. Any data you have written to disk that is still
- pending in a BASIC buffer will not be written to the file by this service.]
- It is interesting to note that BASIC always closes all open files when
- a program ends, so it is not strictly necessary to do that manually. I
- mention this only because you can save a few bytes by eliminating the CLOSE
- command. Also, DOS flushes its buffers and closes all open files when a
- program ends, so a few bytes can be saved this way even with non-BASIC
- programs. Again, I am not necessarily recommending that you do this, and
- some programmers would no doubt disagree with such advice. But the fact is
- that an explicit CLOSE is not truly needed.
-
-
- FILE ACCESS METHODS
- ===================
-
- BASIC offers three fundamental methods for accessing files, and these are
- specified when the file is opened. There are also several variations and
- options available with each method, and these will be discussed in more
- detail in the sections that describe each method.
- The first access method is called Sequential, because it requires you
- to read from or write to the file in a continuous stream. That is, to read
- the last item in a sequential file you must read all of the items that
- precede it. There are three different forms of OPEN for accessing
- sequential files.
- OPEN FOR OUTPUT creates the named file if it does not yet exist, or
- truncates it to a length of zero if it does. Once a file has been opened
- for output, you may only write data to it.
- OPEN FOR APPEND is related to OPEN FOR OUTPUT, and it also tells BASIC
- to open the file for writing. Unlike OPEN FOR OUTPUT, however, OPEN FOR
- APPEND does not truncate a file if it already exists. Rather, it opens the
- file and then seeks to the place just past the last byte. This way, data
- that is subsequently written will be appended to the end of the file. Note
- that OPEN FOR APPEND will also create a file if it does not already exist.
- OPEN FOR INPUT requires that the named file be present; otherwise, a
- "File not found" error will result. Once a file has been opened for input,
- you may only read from it.
- BASIC also offers the SEEK command to skip to any arbitrary position
- in the file, and SEEK can in fact be used with sequential files. However,
- sequential files are generally written using a comma or a carriage
- return/line feed pair, to indicate the end of each data item. Since each
- item can be of a varying length, it is difficult if not impossible to
- determine where in the file a given item begins. That is, if you wanted to
- read, say, the 200th line in a README file, how could you know where to
- seek to?
- The second primary file access method is Random, and it allows you to
- read from and write to the file. When you use OPEN FOR RANDOM, BASIC knows
- that you will be accessing fixed-length blocks of data called *records*.
- The advantage of random access is that any record can be accessed by a
- record number, instead of having to read through the entire file to get to
- a particular location. That is, you can read or write any record randomly,
- without regard to where it is in the file. Because each record has the
- same physical length as every other record, it is easy for BASIC to
- calculate the location in the file to seek to, based on the desired record
- number and the fixed record length.
- Using random access is ideal for data that is already organized as
- fixed-length records such as you would find in a name and address database.
- Since each record contains the same amount of information, there is a
- natural one-to-one correspondence between the data and the record number in
- which it resides. For example, the data for customer number 1 would be
- stored in record number 1, customer 2 is stored in record 2, and so forth.
- Random access can also be used for text and other document files;
- however, that is much less common. Although this would let you quickly
- access any arbitrary line of text in the file, the tradeoff is a
- considerable waste of disk resources. For each line, space equal to the
- longest one must be set aside for all of them. In a typical document file
- line lengths will vary greatly, and it is wasteful to set aside, say, 80
- bytes for each line.
- The third access method is Binary, which is a hybrid of sequential and
- random access. A binary file is opened using the OPEN FOR BINARY command,
- and like random, BASIC lets you both read and write the file. Binary
- access is most commonly used when the data in the file is neither fixed-
- length in nature, nor delimited by commas or carriage returns. One example
- of a binary file is a Lotus 1-2-3 worksheet file. Each cell's contents
- follows a well-defined format, but varying types of information are
- interspersed throughout the file.
- For example, an 8-byte double-precision number may be followed by a
- variable length text field, which is in turn followed by the current column
- width represented as a 2-byte integer. Another example of binary
- information is the header portion of a dBASE data file. Although the data
- itself is of a fixed length, a block of data is stored at the beginning of
- every dBASE data file to indicate the number of fields in each file and
- their type. [Naturally, the length of this header will vary depending on
- the number of fields in each record.] An example program to read Lotus
- worksheet files is given later in this chapter, and a program to read and
- process dBASE files is shown in Chapter 7.
- Note that BASIC imposes its own rules on what you may and may not do
- with each file access method. This is unfortunate, because DOS itself has
- no such restrictions. That is, DOS allows you to open a file for output,
- and then freely read from the same file. To do this with BASIC you must
- first close the file, and then open it again for input. You can bypass
- BASIC entirely if you want, to open files and then read and write them.
- This requires using CALL Interrupt, and examples of doing this will be
- shown in Chapter 12.
- BASIC offers two different forms of the OPEN command. The more common
- method--and the one I prefer--is as follows:
-
- OPEN FileName$ FOR OUTPUT AS #FileNum [LEN = Length].
-
- Of course, OUTPUT could be replaced with RANDOM, BINARY, INPUT, or APPEND.
- The other syntax is more cryptic, and it uses a string to specify the file
- mode. To open a file for output using the second method you'd use this:
-
- OPEN "O", #FileNum, FileName$, [Length]
-
- The first syntax is available only in QuickBASIC and the other current
- versions of the BASIC compiler. The second is a holdover from GW-BASIC,
- and according to Microsoft is maintained solely for compatibility with old
- programs. The available single-letter mode designators are "O" for output,
- "I" for input, "R" for random, "A" for append, and "B" for binary. Note
- that "B" is not supported in GW-BASIC, and was added beginning with
- QuickBASIC version 4.0.
- Besides being more obscure and harder to read, the older syntax does
- not let you specify the various access and sharing options available in the
- newer syntax. One advantage of the older method is that you can defer the
- open mode until the program runs. That is, a string variable can be used
- to determine how the file will be opened. However, there are few
- situations I can envision where that would be useful. Of course, the
- choice is yours, and some programmers continue to use the original version.
-
-
- FILE MANIPULATION STATEMENTS
- ============================
-
- BASIC offers a number of different statements for opening and manipulating
- files. In a few cases, the same command may have different meanings,
- depending on how the file is opened. For example LEN = mentioned earlier
- assumes a different default value when a file is opened for random access
- compared to when it is opened for output. Similarly, GET # may or may not
- accept or require a variable name and optional seek offset, depending on
- the file mode. Therefore, pay close attention to each statement as it is
- described in the sections that follow. Specific differences will be listed
- as they relate to each of the various file access methods.
-
-
- OPENING AND CLOSING FILES
-
- Before any file or device may be accessed, it must first be opened with
- BASIC's OPEN statement. When you use OPEN, it is up to you make up a file
- number that will be used when you reference the file later. If you use
- OPEN "MYDATA" FOR OUTPUT AS #1, then you will also use the same file number
- (1) when you subsequently print to the file. For example, you might use
- PRINT #1, Any$. Initially, it might appear that letting the programmer
- determine his or her own file numbers is a feature. After all, you are
- allowed to make up your own variable names, so why not file numbers too?
- Indeed, BASIC is rare among the popular languages in this regard; both C
- and Pascal require that the programmer remember a file number that is given
- to them.
- There are several problems with BASIC's use of file numbers, and in
- fact DOS does not use this method either. Instead, DOS returns a *file
- handle* when a file has been successfully opened. When an assembly
- language program (or BASIC itself) calls DOS to open a file, it is DOS who
- issues the number, and not the program. BASIC must therefore maintain a
- translation table to relate the numbers you give to the actual handles that
- DOS returns. This table requires memory, and that memory is taken from
- DGROUP.
- But there is another, more severe problem with BASIC's use of file
- numbers instead of DOS handles, because it is possible that you could
- accidentally try to open more than one file using the same number. In a
- small program that opens only one or two files, it is not difficult to
- remember which file number goes with which file. But when designing
- reusable subroutines that will be added to more than one program, it is
- impossible to know ahead of time what file numbers will be in use.
- To solve this problem, Microsoft introduced the FREEFILE function with
- QuickBASIC 4.0. FREEFILE was described in Chapter 4, but it certainly
- bears a brief mention again here. Each time you use FREEFILE it returns
- the next available file number, based on which numbers are already taken.
- Therefore, any subroutine that needs to open a file can use the number
- FREEFILE returns, confident that the number is not already in use.
- Unless you specify otherwise, a file that has been opened for RANDOM
- or BINARY can be both read from and written to. The ACCESS option of the
- OPEN statement lets you indicate that a random or binary file may be read
- or written only. Even though you may ask for both READ and WRITE access
- when the file is opened, read/write permission is the default. In some
- cases you may need to open a file for binary access, and also prevent your
- program from later writing to it. In that case you would use the ACCESS
- READ option.
- Likewise, specifying ACCESS WRITE tells BASIC to let your program
- write to the file, but prevent it from reading. This may seem nonsensical,
- but one situation in which write-only access might be desirable is when
- designing a network mail system. In that case it is quite likely that a
- program would be permitted to send mail to another user's electronic
- "mailbox", but not be allowed to read the mail contained in that file. The
- various ACCESS options are intended for use with any version of DOS higher
- than 2.0.
- Frankly, these ACCESS options are pointless, because if you wrote the
- program then you can control whether the file is read from or written to.
- If you are writing the Send Mail portion of a network application, then you
- would disallow reading someone else's mail as part of the program logic.
- And if you do open a file for ACCESS WRITE, BASIC will generate an error if
- you later try to read from it. So I personally don't see any real value in
- using these ACCESS arguments.
- The remaining two OPEN options are LOCK and SHARED, and these are
- meant for use with shared files under DOS 3.0 or later. Shared access is
- primarily employed on a network, though it is possible to share files on a
- single computer. This could be the case when a file needs to be accessed
- by more than one program when running under a task-switching program such
- as Microsoft Windows.
- You can specify that a file is to be shared by simply adding the
- SHARED clause to the OPEN statement. Thus, another program could both read
- and write the file, even while it is open in your program. To specify
- shared access but prevent other programs from writing to the file you would
- use LOCK WRITE. Similarly, using LOCK READ lets another program write to
- the file but not read from it, and LOCK READ WRITE prevents both.
- The LOCK statement can optionally be used on a shared file that is
- already open to prohibit another program from accessing it only at certain
- times. The LOCK statement allows all or just a portion of a file to be
- locked, and the UNLOCK statement releases the locks that were applied
- earlier. Please understand that these network operations are described
- here just as a way to introduce what is possible. Network and database
- programming will be described in depth in Chapter 7.
- Finally, you close an open file using BASIC's CLOSE command. CLOSE
- accepts one or more file numbers separated by commas, or no numbers at all
- which means that every open file is to be closed. You can also use the
- RESET command to close all currently open files. When a file that has been
- opened for one of the output modes is closed, its file buffer is flushed to
- disk and DOS updates the directory entry for that file to indicate the
- current date and time and new file size. Closing any type of file releases
- the buffer memory back to BASIC's string memory pool for other uses.
-
-
- READING AND WRITING DATA
-
- Once a file has been opened you can read from it, write to it, or both,
- depending on what form of OPEN was used. Any file that has been opened for
- input may be read from only. Unlike the BASIC-related limitations I
- mentioned earlier, DOS imposes this restriction, and for obvious reasons.
- However, when you open a file for output or append, it is BASIC that
- prevents you from reading back what you wrote. BASIC imposes several other
- unfortunate limitations regarding what you can and cannot do with an open
- file, as you will see momentarily.
- Sequential access is commonly used with devices as well as with files.
- Although it is possible to open a printer for random access, there is
- little point since data is always printed sequentially. Similarly, reading
- from the keyboard or writing to the screen must be sequential. In the
- discussions that follow, you can assume that what is said about accessing
- files also applies to devices, unless otherwise noted.
-
-
- Sequential Output
-
- Data is written to a sequential file using the PRINT # statement, using the
- same syntax as the normal PRINT statement when printing to the display
- screen. That is, PRINT # accepts an optional semicolon to suppress a
- carriage return and line feed from being written to the file, or a comma to
- indicate that one or more blank spaces is to be written after the data.
- The number of blanks sent to the file depends on the current print
- position, just like when printing to the screen.
- You can also use the WRITE # statement to print data to a sequential
- file, but I recommend against using WRITE in most situations. Unlike PRINT
- that merely sends the data you give it, WRITE adds surrounding quotes to
- all string data, which takes time and also additional disk space. Since a
- subsequent INPUT from the file will just have to remove those quotes which
- takes even more time, what's the point? Further, WRITE does not let you
- specify a trailing semicolon or comma. Although a comma may be used as a
- delimiter between items written to disk, the comma is stored in the file
- literally when WRITE is used.
- The only time I can see WRITE being useful is for printing data that
- will be read by a non-BASIC application that explicitly requires this
- format. Many database and spreadsheet programs let you import comma-
- delimited data with quoted strings such as WRITE uses. These programs
- treat each complete line ending with a carriage return as an entire record,
- and each comma-delimited item within the line as a field in that record.
- But you should avoid WRITE unless your program really needs to communicate
- with other such applications, because it results in larger data files and
- slower performance.
- Another use for WRITE is to protect strings that contain commas from
- being read incorrectly by a subsequent INPUT statement. INPUT uses commas
- to delimit individual strings, and the quotes allow you to input an entire
- string with a single INPUT command. But BASIC's LINE INPUT does this
- anyway, since it reads an entire line of text up to a terminating carriage
- return. You could also add the quotes manually when needed:
-
-
- IF INSTR(Work$, ",") THEN
- PRINT #1, CHR$(34); Work$; CHR$(34)
- ELSE
- PRINT #1, Work$
- END IF
-
-
- You may also use TAB and SPC to format the output you print to a file or
- device. For the most part, TAB and SPC operate like their non-file
- counterparts, including the need to add an extra empty PRINT to force a
- carriage return at the end of a line. That is, when you use
-
- PRINT Any$; TAB(20)
- or
- PRINT #1, SomeVar; SPC(13)
-
- BASIC adds a trailing semicolon whether you want it or not. To force a new
- line at that point in the printing process requires an additional PRINT or
- PRINT # statement. This isn't really as much of a nuisance as yet another
- code bloater, since an empty PRINT adds 9 bytes of compiler-generated code
- and an empty PRINT # adds 18 bytes.
- One important difference between the screen and file versions of TAB
- and SPC is the way long strings are handled. If you use TAB or SPC in a
- PRINT statement that is then followed by a string too long to fit on the
- current line, the screen version will advance to the next row, and print
- the string at the left edge. This is probably not what you expected or
- wanted. When printing to a file, however, the string is simply written
- without regard to the current column. Column 80 is the default width for
- the screen and printer when they have been opened as devices, though you
- may change that using WIDTH.
- The WIDTH statement lets you specify at which column BASIC is to
- automatically add a carriage return/line feed pair. The default for a
- printer is at column 80. In most programming situations this behavior is a
- nuisance, since many printers can accommodate 132 columns. After all, why
- shouldn't you be allowed to print what you want when you want, without
- BASIC intervening to add unexpected and often unwanted extra characters?
- Most programmers disable this automatic line wrapping by using WIDTH #
- FileNum, 255 if the printer was opened as a device, or WIDTH LPRINT, 255 if
- using LRPINT statements.
- Curiously, this special value is not mentioned anywhere in the
- otherwise very complete documentation that comes with BASIC PDS. In fact,
- using a width value of 255 is mandatory if you intend to send binary data
- to a printer. Most modern printers accept both graphics commands and
- downloadable fonts. Since either of these will no doubt result in strings
- longer than 80 or even 255 characters, it is essential that you have a way
- to disable the "favor" that BASIC does for you. Undoubtedly, the automatic
- addition of a carriage return and line feed goes back to the early days of
- primitive printers that required this. The only reason Microsoft continues
- this behavior is to assure compatibility with programs written using
- earlier versions of BASIC.
- Related to the WIDTH anomaly is BASIC's insistence on adding a
- CHR$(10) line feed whenever you print a CHR$(13) carriage return to a
- device. Again, this dubious feature is provided on the assumption that you
- would always want a line feed after every carriage return. But there are
- many cases where you wouldn't, such as the font and graphics examples
- mentioned earlier. If you add the "BIN" (binary) option when opening a
- printer, you can prevent BASIC from forcing a new line every 80 columns,
- and also suppress the addition of a line feed following each carriage
- return. For example, OPEN "LPT1:BIN" FOR OUTPUT AS #1 tells BASIC to open
- the first parallel printer in binary mode.
- The PRINT # USING statement lets you send formatted numeric data to a
- file, in the same way you would use the regular PRINT USING to format
- numbers on the screen. PRINT # USING accepts the same set of formatting
- commands as PRINT USING, allowing you to mix text and formatted numbers in
- a single PRINT operation. If your program will be printing formatted
- reports from the disk file later, I recommend using PRINT USING at that
- time, instead of when writing the data to disk. Otherwise, the extra
- spaces and other formatting information are added to the file increasing
- its size. In fact, PRINT # USING is really most appropriate when printing
- to a device such as a printer.
- Finally, it is important to point out the importance of selecting a
- suitable buffer size. As I described earlier, BASIC and DOS employ an area
- of memory as a buffer to hold information on its way to and from disk.
- This way information can often be written to or read from memory, instead
- of having to access the physical disk each time. Besides the buffers that
- DOS maintains, BASIC provides additional buffering when your program is
- using sequential input or output.
- BASIC lets you control the size of this buffer, using the LEN = option
- of the OPEN statement. In general, the larger you make the buffer, the
- faster your programs will read and write files. The trade-off, however, is
- that BASIC's buffers are stored in string memory. With QuickBASIC and near
- strings in BASIC PDS, the buffer is located in DGROUP. When BASIC PDS far
- strings are used, the buffer is in the same segment that the current module
- uses for string storage.
- Conversely, you can actually reduce the default buffer size when
- string space is at a premium, but at the expense of disk access speed.
- When using OPEN FOR INPUT and OPEN FOR OUTPUT, BASIC sets aside 512 bytes
- of string memory for the buffer, unless you specify otherwise. If you have
- many sequential files open at once you could reduce the buffer sizes to 128
- bytes, for a net savings of 384 bytes for each file. The legal range of
- values for LEN = is between 1 and 32767 bytes.
- Notice that the best buffer values will be a multiple of a power of
- two, and when increasing the buffer size, a multiple of 512. Since a disk
- sector is almost always 512 bytes, DOS will fill the buffer with an entire
- sector. In fact, DOS always reads and writes entire sectors anyway. If
- you use a buffer size of, say, 600 bytes, DOS will have to read 1024 bytes
- just to get the first portion of the second sector. But when more data is
- needed later, BASIC will then have to go back and ask DOS for the same
- information again. By reading entire sectors or evenly divisible portions
- of a sector, you can avoid having BASIC and DOS read the same information
- more than once.
- Even though larger buffers usually translate to better performance,
- you will eventually reach the point of diminishing returns, beyond which
- little performance improvement will result. Table 6-1 shows the timing
- results with various buffer sizes when reading a 104K BASIC source file
- using LINE INPUT. Understand that this test is informal, and merely shows
- the results obtained using only one PC. In particular, the hard disk
- results are for a fairly fast (17 millisecond) 150 MB ESDI drive and a PC
- equipped with a 25 MHz. 386. Therefore, the improvement from a larger
- buffer is less than you would get on a slower computer with a slower hard
- disk or with a floppy disk. Many older XT and AT compatible PCs will
- probably fall somewhere between the results shown here for the hard and
- floppy disks. Notice that while the improvement actually seems somewhat
- worse for some increases, this can be attributed to the lack of resolution
- in the PC's system timer.
-
- Fast ESDI hard disk:
-
- Buffer Size (in bytes) Seconds
- ---------------------- -------
- 64 2.699
- 128 2.420
- 256 2.410
- 512 2.420
- 1024 2.311
- 2048 2.139
- 4096 2.201
- 8192 2.080
- 16384 2.039
-
-
- 360K floppy disk:
-
- Buffer Size (in bytes) Seconds
- ---------------------- -------
- 64 45.260
- 128 45.141
- 256 45.148
- 512 45.150
- 1024 27.180
- 2048 18.180
- 4096 13.570
- 8192 11.650
- 16384 11.371
-
- Table 6-1: Timing Results For Sequential Reading Versus Buffer Size.
-
- It is important to point out that a buffer is created only for sequential
- input and output, and also for random files with QuickBASIC. Opening a
- file for random access with BASIC PDS [and I'll presume VB/DOS] does not
- create a buffer, nor does opening a file for binary with either version.
- Further, with random access files a buffer is created by QuickBASIC only
- when FIELD is used, and the buffer is located within the actual fielded
- strings. Therefore, the LEN = argument in an OPEN FOR RANDOM statement
- merely tells BASIC how to calculate record offsets when SEEK and GET are
- used.
-
-
- Sequential Input
-
- Sequential data is read using INPUT #, LINE INPUT #, or INPUT$ #. Like the
- console form of INPUT, INPUT # can be used to read one or more variables of
- any type and in any order with a single statement. When reading a file,
- INPUT # recognizes both the comma and the carriage return as a valid
- delimiter, to indicate the end of one variable. This is in contrast to the
- regular [keyboard] version of INPUT, which issues a "Redo from start" error
- if the wrong number of comma-delimited variables are entered. Instead,
- INPUT # simply moves on to the next line for the remaining variables.
- LINE INPUT # avoids this entirely, and simply reads an entire string
- without regard to commas until a carriage return is encountered. This
- precludes LINE INPUT # from being used with anything but string variables.
- However, LINE INPUT # can be used with fixed- as well as variable-length
- strings, without the overhead of copying from one type to the other that
- BASIC usually adds. [This copying was described in Chapter 2.] As with
- INPUT #, LINE INPUT # strips leading and trailing quotes from the line if
- they are present in the file.
- The last method for reading a sequential file or device is with the
- INPUT$ # function. INPUT$ # is used to read a specified number of
- characters, without regard to their meaning. Where commas and carriage
- returns are normally used to delimit each line of text, INPUT$ returns them
- as part of the string. INPUT$ # accepts two arguments--the number of
- characters to read and the file number--and assigns them to the specified
- string. To read, say, 20 bytes from a sequential file that has been opened
- as #3, you would use Any$ = INPUT$(20, #3). Although the pound sign (#) is
- optional, I prefer to include it to avoid confusion as to which parameter
- is the file number and which is the number of bytes.
- As with sequential output, specifying a larger buffer size than the
- default 512 bytes can greatly improve the speed of INPUT # and LINE INPUT #
- statements, but at the expense of string memory.
-
-
- Random Access
-
- Unlike sequential files that are almost always read starting at the
- beginning, data in a random access file can be accessed literally in any
- arbitrary order. Random access files are comprised of fixed-length
- *records*, and each record contains one or more *fields*. The most common
- application of random access techniques is in database programs, where each
- record holds the same type of information as the next. For example, a
- customer name and address database is comprised of a first name, a last
- name, a street address, city, state, and zip code. Even though different
- names and addresses will be stored in different records, the format and
- length of the information in each record is identical.
- BASIC provides two different ways to handle random access files: the
- FIELD statement and TYPE variables. Before QuickBASIC version 4.0, the
- FIELD method was the only way to define the structure of a random access
- data file. Although Microsoft has publicly stated that FIELD is provided
- in current versions of BASIC only for compatibility with older programs, it
- has several important properties that cannot be duplicated in any other
- way. FIELD also lets you perform some interesting an non-obvious tricks
- that have nothing to do with reading or writing files. These are described
- later in this chapter in the section *Advanced File Techniques*.
- Once a file has been opened for RANDOM you may use the FIELD statement
- by specifying one or more string variables to hold each field, along with
- their length. A typical example showing the syntax for the FIELD statement
- is as follows:
-
-
- OPEN FileName$ FOR RANDOM AS #1 LEN = 97
- FIELD #1, 17 AS LastName$, 14 AS FirstName$, 32 AS Address$, 15 AS City$, _
- 2 AS State$, 9 AS Zip$, 8 AS BalanceDue$
-
-
- Here, the file is opened for random access, and the record length is
- established as being 97 characters. This allows room for each of the
- fields in the FIELD statement. In this case 17 characters are set aside
- for the last name, 14 for the first name, 32 for the street address, 15 for
- the city, 2 for the state, 9 for the zip code, and 8 for the double
- precision balance due value. I often use a field length of 32 characters
- for name and address data, because that's how many can fit comfortably on a
- standard 3-1/2 by 15/16 inch mailing label. (The first and last names
- above add up to 32 characters, including a separating blank space.)
- Note that the underscore shown above is used here as line continuation
- character, and you'd actually type the entire statement as one long line.
- In fact, in most cases a FIELD statement must be able to fit entirely on a
- single line, and there is no direct way to continue the list of variables.
- Although the BC compiler recognizes an underscore to continue a line as
- shown here, the BASIC environment does not. Underscores in a source file
- are removed by the BASIC editor when the file is loaded, and the lines are
- then combined.
- If a second FIELD statement for the same file number is given on a
- separate line, the additional strings specified are placed starting at the
- beginning of the same buffer. While it is possible to coerce a new FIELD
- statement to begin farther into the buffer, that requires an additional
- dummy string variable:
-
-
- FIELD #1, 17 AS LastName$, 14 AS FirstName$
- FIELD #1, 31 AS Dummy$, 32 AS Address$, 15 AS City$
- FIELD #1, 78 AS Dummy2$, 2 AS State$, 9 AS Zip$
-
-
- Here, the dummy strings are used as placeholders to force the Address$ and
- State$ variables farther into the buffer, and you would not refer to the
- dummy strings in your program.
- Once a field buffer has been defined, special precautions are needed
- when assigning and reading the fielded string variables. As you know,
- BASIC often moves strings around in memory when they are assigned.
- However, that would be fatal if those strings are in a field buffer. A
- field buffer is written to disk all at once when you use PUT, and it is
- essential that all of the strings therein be contiguous. If you simply
- assign a variable that is part of a field buffer, BASIC may move the string
- data to a new location outside of the buffer and your program will fail.
- To avoid this problem you must assign fielded string using either
- LSET, RSET, or the statement form of MID$. These BASIC commands let you
- insert characters into a string, so BASIC will not have to claim new string
- memory. This further contributes to FIELD's complexity, and it also adds
- slightly to the amount of code needed for each assignment. For example,
- the statement One$ = Two$ generates 13 bytes of compiled code, and the
- statement LSET One$ = Two$ creates 17. Although LSET is generally faster
- than a direct assignment, it is important to understand that it also
- creates more code. But the situation gets even worse.
- Because all of the variables in a field buffer must be strings,
- additional steps are needed to assign numeric variables such as integer and
- double precision. The CVI and MKS$ family of BASIC functions are needed to
- convert numeric data to their equivalent in string form and back. There
- are eight of these functions in QuickBASIC with two each for integer, long
- integer, single precision, and double precision variables. BASIC PDS adds
- two more to support the Currency data type. All of the various conversion
- functions have names that start with the letters MK or CV, and a complete
- list can be found in your BASIC manual.
- To convert a double precision variable to equivalent data in an 8-byte
- string you would use MKD$, and to convert a 2-byte string that holds an
- integer to an actual integer value you would use CVI. MKD$ stands for
- "Make Double into a string" and it has a dollar sign to show that it
- returns a string. CVI stands for "Convert to Integer" and the absence of a
- dollar sign shows that it returns a numeric value. Combined with the
- requisite LSET, a complete assignment prior to writing a record to disk
- with PUT would be something like this: LSET BalanceDue$ = MKD$(BalDue#).
- And if a record has just been read using GET, an integer value in the field
- buffer could be retrieved using code such as MyInt% = CVI(IntVar$).
- The need for LSET, RSET, CVI, and MKS$ and so forth has historically
- made learning random access file techniques one of the most difficult and
- messy aspects of BASIC programming. Besides having to learn all of the
- statements and how they are used, you also need to understand how many
- bytes each numeric data type occupies to set aside the correct amount of
- space in the field buffer. Further, a lot of compiled code is created to
- convert large amounts of data between numeric and string form. For these
- and other reasons, Microsoft introduced the TYPE variable with its release
- of QuickBASIC 4.0.
- The TYPE method allows you to establish a record's structure by
- defining a custom variable that contains individual components for each
- field in the record. In general, using TYPE is a much clearer way to
- define a record, and it also avoids the added library code to handle the
- FIELD, LSET, CVI, and MKS$ statements. When you use AS INTEGER and AS
- DOUBLE and so forth to define each portion of the TYPE, the correct number
- of bytes are allocated to store the value in its native fixed-length
- format. This avoids having to convert the data to and from ASCII digits.
- Using the earlier example, here's how you would define and assign the
- same record using a TYPE variable:
-
-
- TYPE Record
- LastName AS STRING * 17
- FirstName AS STRING * 14
- Address AS STRING * 32
- State AS STRING * 2
- Zip AS STRING 9
- BalanceDue AS DOUBLE
- END TYPE
- DIM MyRecord AS Record
-
- MyRecord.LastName = LastName$
- MyRecord.FirstName = FirstName$
- MyRecord.Address = Address$
- MyRecord.State = State$
- MyRecord.Zip = Zip$
- MyRecord.BalanceDue = BalanceDue#
-
-
- Even though the same names are used for both the TYPE variable members and
- the strings they are being assigned from, you may of course use any names
- you want. You could also assign the portions of a TYPE variable from
- constants using MyRecord.Zip = "06896" or MyRecord.BalanceDue = 4029.80.
- Further, one entire TYPE variable may be assigned to another in a single
- operation using ThisType = ThatType. Dissimilar TYPE variables may be
- assigned using LSET like this: LSET MyType = YourType.
- As you can see, using TYPE variables instead of FIELD yields an
- enormous improvement in a program's clarity. However, there are still some
- programming problems that only FIELD can solve. One limitation of using
- TYPE variables is that the file structure must be known when the program is
- compiled, and you cannot defer this until runtime. Therefore, it is
- impossible to design a general purpose database program, in which a single
- program can manipulate any number of differently structured files. The
- compiler needs to know the length and type of data within a TYPE variable,
- in order to access the data it contains. So while you can use a variable
- as the LEN = argument with OPEN, the record structure itself must remain
- fixed.
- FIELD avoids that limitation because it accepts a variable number of
- arguments, and varying lengths within each field component. Therefore, by
- dimensioning a string array to the number of elements needed for a given
- record, the entire process of opening, fielding, reading, and writing can
- be handled using variables whose contents and type are determined at
- runtime. Some amount of IF testing will of course be required when the
- program runs, but at least it's possible to process a file using variable
- information.
- The following complete program first creates a random access file with
- five slightly different records using a TYPE variable. It then reads the
- file independently of the TYPE structure using the FIELD method. Although
- the second portion of the program uses DATA statements to define the file's
- structure, in practice this information would be read from disk. In fact,
- this is the method used by dBASE and Clipper files, based on the field
- information that is stored in a header portion of the data file.
-
- '----- create a data file containing five records
- DEFINT A-Z
-
- TYPE MyType
- FirstName AS STRING * 17
- LastName AS STRING * 14
- DblValue AS DOUBLE
- IntValue AS INTEGER
- MiscStuff AS STRING * 20
- SngValue AS SINGLE
- END TYPE
- DIM MyVar AS MyType
-
- OPEN "MYFILE.DAT" FOR RANDOM AS #1 LEN = 65
- MyVar.FirstName = "Jonathan"
- MyVar.LastName = "Smith"
- MyVar.DblValue = 123456.7
- MyVar.IntValue = 10
- MyVar.MiscStuff = "Miscellaneous stuff"
- MyVar.SngValue = 14.29
- FOR X = 1 TO 5
- PUT #1, , MyVar
- MyVar.DblValue = MyVar.DblValue * 2
- MyVar.IntValue = MyVar.IntValue * 2
- MyVar.SngValue = MyVar.SngValue * 2
- NEXT
- CLOSE #1
-
-
- '----- read the data without regard to the TYPE above
- READ FileName$, NumFields
- REDIM Buffer$(1 TO NumFields) 'holds the FIELD strings
- REDIM FieldType(1 TO NumFields) 'the array of data types
-
- RecLength = 0
- FOR X = 1 TO NumFields
- READ ThisType
- FieldType(X) = ThisType
- RecLength = RecLength + ABS(ThisType)
- NEXT
-
- OPEN FileName$ FOR RANDOM AS #1 LEN = RecLength
-
- PadLength = 0
- FOR X = 1 TO NumFields
- ThisLength = ABS(FieldType(X))
- FIELD #1, PadLength AS Pad$, ThisLength AS Buffer$(X)
- PadLength = PadLength + ThisLength
- NEXT
-
- NumRecs = LOF(1) \ RecLength 'calc number of records
- FOR X = 1 TO NumRecs 'read each in sequence
- GET #1 'get the current record
- CLS
- FOR Y = 1 TO NumFields 'walk through each field
- PRINT "Field"; Y; TAB(15); 'display each field
- SELECT CASE FieldType(Y) 'see what type of data
- CASE -8 'double precision
- PRINT CVD(Buffer$(Y)) 'so use CVD
- CASE -4 'single precision
- PRINT CVS(Buffer$(Y)) 'as above
- CASE -2 'integer
- PRINT CVI(Buffer$(Y))
- CASE ELSE 'string
- PRINT Buffer$(Y)
- END SELECT
- NEXT
- LOCATE 20, 1
- PRINT "Press a key to view the next record ";
- WHILE LEN(INKEY$) = 0: WEND
- NEXT
- CLOSE #1
- END
-
- DATA MYFILE.DAT, 6
- DATA 17, 14, -8, -2, 20, -4
-
- There are several issues that need elaboration in this program. First is
- the use of arrays to hold the fielded string data and also each field's
- type. When the field buffer is defined with an array, the same variable
- name can be used repeatedly in a loop. A parallel array that holds the
- field data types permits the program to relate the field data to its
- corresponding type of data. That is, Buffer$(3) holds the data for field
- 3, and FieldType(3) indicates what type of data it is.
- Second, the FieldType array uses a simple coding method that combines
- both the data type and its length into a single value. That is, positive
- values are used to indicate string data, and the value itself is the field
- length. Negative values reflect the data type as well as the length, using
- a negative version of that data type's length. Specifically, -8 is used to
- indicate a double precision field type, -4 a single precision type, and -2
- an integer. If you need to handle long integers or the BASIC PDS Currency
- data type, you'll need to devise a slightly different method. I chose this
- one because it is simple and effective.
- The final point worth mentioning when comparing FIELD to TYPE is that
- the field buffer is relinquished back to BASIC's string pool when the file
- is closed. But when a TYPE variable is dimensioned, the near memory it
- occupies is allocated by the compiler, and is never available for other
- uses. Although there is a solution, it requires some slight trickery. The
- statement REDIM TypeVar(1 TO 1) AS TypeName will create a 1-element TYPE
- array in far memory that can then be used as if it were a single TYPE
- variable. That is, any place you would have used the TYPE variable, simply
- substitute the sole element in the array.
- Understand that more code is required to access data in a dynamic
- array than in a static variable. For example, an integer assignment to a
- member of a dynamic TYPE array generates 17 bytes of code, compared to only
- 6 bytes for the same operation on a static TYPE. But when string space is
- more important than .EXE file size, this trick can make the difference
- between a program that runs and one that doesn't.
- Regardless of which method you use--TYPE or FIELD--there are several
- additional points to be aware of. First, the PUT # and GET # statements
- are used to write and read a random access file respectively. PUT # and
- GET # accept two different forms, depending on whether you are using TYPE
- or FIELD to define the record structure.
- When FIELD is used, PUT # and GET # may be used with either no
- argument to access the current record, or with an optional record number
- argument. That is, PUT #1 writes the current field buffer contents to disk
- at the current DOS SEEK position, and GET #1, RecNum reads record number
- RecNum into the buffer for subsequent access by your program.
- As with sequential files, each time a record is read or written, DOS
- advances its internal seek location to the next successive position in the
- file. Therefore, to read a group of records in forward order does not
- require a record number, nor does writing them in that order. In fact,
- slightly more time is required to access a record when a record number is
- given but not needed, because BASIC makes a separate call to perform an
- explicit Seek to that location in the file.
- When the TYPE method is used to access random access data, the record
- number is also optional, but you must provide the name of a TYPE variable
- or TYPE array element. In this case, the record number is still used as
- the first argument, and the TYPE variable is the second argument. If you
- omit the record number you must include an empty comma placeholder. For
- example, PUT #1, RecNum, TypeVar writes the contents of TypeVar to the file
- at record number RecNum, and GET #1, , TypeArray(X) reads the current
- record into TYPE array element X.
- It is not essential that the TYPE variable be as long as the record
- length specified when LEN = was used with OPEN, but it generally should be.
- When a record number is given with PUT # or GET #, BASIC uses the original
- LEN = value to know where to seek to in the file. If a record number is
- omitted, BASIC will still advance to the next complete record even if the
- TYPE variable being read or written is shorter than the stated record
- length. In most cases, however, you should use a TYPE whose length
- corresponds to the LEN = argument unless you have a good reason not to.
- Notice that when LEN = is omitted, BASIC defaults to a record length
- of 128 bytes. Indeed, forgetting to include the length can lead to some
- interesting surprises. One clever trick that avoids having to calculate
- the record length manually is to use BASIC's LEN function. Although
- earlier versions of BASIC allowed LEN only in conjunction with string
- variables, QuickBASIC 4.0 and later versions recognize LEN for any type of
- data.
- For example, LEN(IntVar%) is always 2, and LEN(AnyDouble#) is always
- equal to 8. When LEN is used this way the compiler merely substitutes the
- appropriate numeric constant when it builds your program. Since LEN can
- also be used with TYPE variables and TYPE array elements, you can let BASIC
- do the byte counting for you. The brief program fragment below shows this
- in context.
-
-
- TYPE Something
- X AS INTEGER
- Y AS DOUBLE
- Z AS STRING * 100
- END TYPE
- DIM Anything AS Something
- OPEN MyData$ FOR RANDOM AS #1 LEN = LEN(Anything)
-
-
- In particular, this method is useful if you later modify the TYPE
- definition, since the program will be self-accommodating. Changing Z to
- STRING * 102 will also change the value used as the LEN = argument to OPEN.
- Be careful to use the actual variable name with LEN, and not the TYPE name
- itself. That is, LEN(Anything) will equal 110, but LEN(Something) will be
- 2 if DEFINT is in effect. When BASIC sees LEN(Something) it assumes you
- are referring to a variable with that name, not the TYPE definition.
- The only time this use of LEN will be detrimental is when it is used
- as a passed parameter many times in a program. Since LEN is treated in
- this case as a numeric constant, it is subject to the same copying issues
- that CONST values and literal numbers are. Therefore, you would probably
- want to assign a variable once from the value that LEN returns, and use
- that variable repeatedly later as described in Chapter 2.
-
-
- Binary Access
-
- Binary file access lets you read or write any portion of a file, and
- manipulate any type of information. Reading a sequential file requires
- that the end of each data item be identified by a comma, or a carriage
- return line feed pair. Random access files do not require special
- delimiters, and instead rely on a fixed record length to know where each
- record's data starts and ends. A binary file may be organized in any
- arbitrary manner; however, it is up to the programmer to devise a method
- for determining what goes where in the file.
- The overwhelming advantage of binary over sequential access is the
- enormous space and speed savings. A file that requires extra carriage
- returns or commas will be larger than one that does not. Moreover, numeric
- data in a binary file is stored in its native fixed-length format, instead
- of as a string of ASCII digits. Therefore, the integer value -32700 will
- occupy only two bytes, as opposed to the seven needed for the digits plus
- either a comma or carriage return and line feed.
- Furthermore, converting between numbers and their ASCII representation
- is one of the slowest operations in BASIC. Because the STR$ and VAL
- functions must be able to operate on floating point numbers and perform
- rounding, they are extremely slow. For example, VAL must examine the
- digits in a string for many special characters such as "e", "d", "&H", and
- so forth. And with the statement IntVar% = VAL("1234.56"), VAL must also
- round the value to 1235 before assigning the result to IntVar%. Even if
- you don't use STR$ or VAL explicitly when reading or writing a file, BASIC
- does internally. That is, the statement PRINT #1, D# is compiled as if you
- used PRINT #1, STR$(D#). Likewise, INPUT #1, IntVar% is compiled the same
- as INPUT #1, Temp$: IntVar% = VAL(Temp$).
- When a file has been opened for binary access you may not use PRINT #,
- WRITE #, or PRINT # USING. The only statement that can write data to a
- binary file is PUT #. PUT # may be used with any type of variable, but not
- constants or expressions. That is, you can use PUT #1, , AnyVar, but not
- PUT #1, , 13 or PUT #1, SeekLoc, X + Y! or PUT #1, , LEFT$(Work$, 10).
- This is yet another unnecessary BASIC limitation, which means that to write
- a constant you must first assign it to a temporary variable, and then use
- PUT specifying that variable.
- Reading from a binary file requires GET #, which is the complement of
- PUT #. Like PUT #, GET # may be used with any kind of variable, including
- TYPE variables. When a string variable is written to disk with PUT #, the
- entire string is sent. However, when a string variable is used with GET #,
- BASIC reads only as many bytes as will fit into the target string. So to
- read, say, 20 bytes into a string from a binary file you would use this:
-
- Temp$ = SPACE$(20) 'make room for 20 bytes
- GET #FileNum, , Temp$ 'read all 20 bytes
-
- Although fixed-length strings cannot be cleared to relinquish the memory
- they occupied, they are equally valid for reading data from a binary file:
-
- DIM FLen AS STRING * 20
- GET #FileNum, , FLen
-
- You can also use INPUT$ to read a specified number of bytes from a binary
- file. Therefore you can replace both examples above with the statement
- Temp$ = INPUT$(20, #FileNum). Contrary to some versions of Microsoft BASIC
- documentation, PUT # does not store the length of the string in a binary
- file prior to writing the data as it does with files opened for RANDOM.
- As you've seen, data is written to a binary file using the PUT #
- command, and read using GET #. These work much like their random access
- counterparts in that a seek offset is optional, and if omitted must be
- replaced with an empty comma placeholder. But where the seek argument in a
- random GET # or PUT # specifies a record number, a binary GET # treats it
- as a byte offset into the file.
- The first byte in a binary file is considered by BASIC to be byte
- number 1. This is important to point out now, because DOS considers the
- first byte to be numbered 0. When we discuss using CALL Interrupt to
- access files in Chapter 12, you will need to take this difference into
- account.
- When reading and writing binary files, BASIC always uses the length of
- the specified variable to know how many bytes to read or write. The
- statement GET #1, , IntVar% reads two bytes at the current DOS seek
- location into the integer variable IntVar%, and PUT #1, 1000, LongVar#
- writes the contents of LongVar# (eight bytes) to the file starting at the
- 1000th byte. Let's now take a look at a practical application of binary
- file techniques.
- Rather than invent a binary file format as an example, I will instead
- use the Lotus 1-2-3 file structure to illustrate the effective use of
- binary access. Although it is possible to skip around in a binary file and
- read its data in any arbitrary order, a Lotus worksheet file is intended to
- be read sequentially. Each data item is preceded by an integer code that
- indicates the type and length of the data that follows. Note that the same
- format is used by Lotus 1-2-3 versions 1 and 2, and also Lotus Symphony.
- Newer versions of 1-2-3 that support three-dimensional work sheets use a
- different format that this program will not accommodate.
- A Lotus spreadsheet can contain as many as 63 different kinds of data.
- However, we will concern ourselves with only those that are of general
- interest such as cell contents and simple formatting commands. These are
- Beginning of File, End of File, Integer values, Floating point values, Text
- labels and their format, and the double precision values embedded within a
- Formula record. The format used by the actual formulas is quite complex,
- and will not be addressed. Other records that will not be covered here are
- those that pertain to the structure of the worksheet itself. For example,
- range names, printer setup strings, macro definitions, and so forth. You
- can get complete information on the Lotus file structure as well as other
- standard formats in Jeff Walden's excellent book, *File Formats for Popular
- PC Software* (Wiley Press, ISBN 0-471-83671-0). [Unfortunately that book
- is now out of print. But you may be able to get this information from
- Lotus directly.]
- A Lotus file is comprised of individual records, and each record may
- have a varying length. The length of a record depends on its type and
- contents, and most records contain a fixed-length header which describes
- the information that follows. Regardless of the type of record being
- considered, each follows the same format: an operation code (opcode), the
- data length, and the data itself.
- The opcode is always a two-byte integer which identifies the type of
- data that will follow. For example, an opcode of 15 indicates that the
- data in the record will be treated by 1-2-3 as a text label. The length is
- also an integer, and it holds the number of bytes in the Data section (the
- actual text) that follows.
- All of the records that pertain to a spreadsheet cell contain a
- five-byte header at the beginning of the data section. These five bytes
- are included as part of the data's length word. The first header byte
- contains the formatting information, such as the number of decimal
- positions to display. The next two bytes together contain the cell's row
- as an integer, and the following two bytes hold the cell's column.
- Again, this header is present only in records that refer to a cell's
- contents. For example, the Beginning of File and End of File records do
- not contain a header, nor do those records that describe the worksheet.
- Some records such as labels and formulas will have a varying length, while
- those that contain numbers will be fixed, depending on the type of number.
- Floating point values are always eight bytes long, and are in the same IEEE
- format used by BASIC. Likewise, an integer value will always have a length
- of two bytes. Because the length word includes the five-byte header size,
- the total length for these double precision and integer examples is 13 and
- 7 respectively.
- It is important to understand that in a Lotus worksheet file, rows and
- columns are based at zero. Even though 1-2-3 considers the leftmost row to
- be number 1, it is stored in the file as a zero. Likewise, the first
- column as displayed by 1-2-3 is labelled "A", but is identified in the file
- as column 0. Thus, it is up to your program to take that into account as
- translates the columns to the alphabetic format, if you intend to display
- them as Lotus does.
- In the Read portion of the program that follows, the same steps are
- performed for each record. That is, binary GET # statements read the
- record's type, length, and data. If the record type indicates that it
- pertains to a worksheet cell, then the five-byte header is also read using
- the GetFormat subprogram. Opcodes that are not supported by this program
- are simply displayed, so you will see that they were encountered.
- The Write portion of the program performs simple formatting, and also
- ensures that a column-width record is written only once. Table 6-2 shows
- the makeup of the numeric formatting byte used in all Lotus files.
-
-
- bits --> 7 6 5 4 3 2 1 0
- ^ ^ ^ ^ ^ ^ ^ ^
- | | | | | | | |
- protected if set ----+ | | | | | | |
- type of format -------+--+--+ | | | |
- number of digits ----------------+--+--+--+
-
- ^ ^ ^
- | | |
- fixed number of digits 0 0 0
- exponential notation 0 0 1
- currency 0 1 0
- percent 0 1 1
- flag to add commas 1 0 0
- unused 1 0 1
- unused 1 1 0
- other format 1 1 1
-
- Table 6-2: The Structure of a Lotus 1-2-3 Format Byte.
-
-
- The program example below can either read or write a Lotus 1-2-3 worksheet
- file. If you select Create when this program is run, it will write a
- worksheet file named SAMPLE.WKS suitable for reading into any version of
- Lotus 123. This sample file contains an assortment of labels and values.
- If you select Read, the program will prompt for the name of a worksheet
- file which it then reads and displays.
-
- DEFINT A-Z
- DECLARE SUB GetFormat (Format, Row, Column)
- DECLARE SUB WriteColWidth (Column, ColWidth)
- DECLARE SUB WriteInteger (Row, Column, ColWidth, Temp)
- DECLARE SUB WriteLabel (Row, Column, ColWidth, Msg$)
- DECLARE SUB WriteNumber (Row, Col, ColWidth, Fmt$, Num#)
-
- DIM SHARED CellFmt AS STRING * 1 'to read one byte
- DIM SHARED ColNum(40) 'max columns to write
- DIM SHARED FileNum 'the file number to use
-
- CLS
- PRINT "Read an existing 123 file or ";
- PRINT "Create a sample file (R/C)? "
- LOCATE , , 1
- DO
- X$ = UCASE$(INKEY$)
- LOOP UNTIL X$ = "R" OR X$ = "C"
- LOCATE , , 0
- PRINT X$
-
- IF X$ = "R" THEN
-
- '----- read an existing file
- INPUT "Lotus file to read: ", FileName$
- IF INSTR(FileName$, ".") = 0 THEN
- FileName$ = FileName$ + ".WKS"
- END IF
- PRINT
-
- '----- get the next file number and open the file
- FileNum = FREEFILE
- OPEN FileName$ FOR BINARY AS #FileNum
-
- DO UNTIL Opcode = 1 'until End of File code
-
- GET FileNum, , Opcode 'get the next opcode
- GET FileNum, , Length 'and the data length
-
- SELECT CASE Opcode 'filter the Opcodes
-
- CASE 0 'Beginning of File record
- PRINT "Beginning of file, Lotus ";
- GET FileNum, , Temp
-
- SELECT CASE Temp
- CASE 1028
- PRINT "1-2-3 version 1.0 or 1A"
- CASE 1029
- PRINT "Symphony version 1.0"
- CASE 1030
- PRINT "123 version 2.x"
- CASE ELSE
- PRINT "NOT a Lotus File!"
- END SELECT
-
- CASE 1 'End of File
- PRINT "End of File"
-
- CASE 12 'Blank cell
- 'Note that Lotus saves blank cells only if
- 'they are formatted or protected.
- CALL GetFormat(Format, Row, Column)
- PRINT "Blank: Format ="; Format,
- PRINT "Row ="; Row,
- PRINT "Col ="; Column
-
- CASE 13 'Integer
- CALL GetFormat(Format, Row, Column)
- GET FileNum, , Temp
- PRINT "Integer: Format ="; Format,
- PRINT "Row ="; Row,
- PRINT "Col ="; Column,
- PRINT "Value ="; Temp
-
- CASE 14 'Floating point
- CALL GetFormat(Format, Row, Column)
- GET FileNum, , Number#
- PRINT "Number: Format ="; Format,
- PRINT "Row ="; Row,
- PRINT "Col ="; Column,
- PRINT "Value ="; Number#
-
- CASE 15 'Label
- CALL GetFormat(Format, Row, Column)
- 'Create a string to hold the label. 6 is
- 'subtracted to exclude the Format, Column,
- 'and Row information.
-
- Info$ = SPACE$(Length - 6)
- GET FileNum, , Info$ 'read the label
- GET FileNum, , CellFmt$ 'eat the CHR$(0)
- PRINT "Label: Format ="; Format,
- PRINT "Row ="; Row,
- PRINT "Col ="; Column, Info$
-
- CASE 16 'Formula
- CALL GetFormat(Format, Row, Column)
- GET FileNum, , Number# 'read cell value
- GET FileNum, , Length 'and formula length
- SEEK FileNum, SEEK(FileNum) + Length 'skip formula
- PRINT "Formula: Format ="; Format,
- PRINT "Row ="; Row,
- PRINT "Col ="; Column,
- PRINT "Value ="; Number#
-
- CASE ELSE
- Dummy$ = SPACE$(Length) 'skip the record
- GET FileNum, , Dummy$ 'read it in
- PRINT "Opcode: "; Opcode 'show its Opcode
-
- END SELECT
-
- '----- pause when the screen fills
- IF CSRLIN > 21 THEN
- PRINT
- PRINT "Press <ESC> to end or ";
- PRINT "any other key for more"
- DO
- K$ = INKEY$
- LOOP UNTIL LEN(K$)
- IF K$ = CHR$(27) THEN EXIT DO
- CLS
- END IF
-
- NumRecs = NumRecs + 1 'count the records
-
- LOOP
- PRINT "Number of Records Processed ="; NumRecs
- CLOSE
-
- ELSE
-
- '----- write a sample file
- FileNum = FREEFILE 'as above
- OPEN "SAMPLE.WKS" FOR BINARY AS #FileNum
-
- Temp = 0 'OpCode for Start of File
- PUT FileNum, , Temp 'write that
- Temp = 2 'its data length is 2
- PUT FileNum, , Temp 'since it's an integer
- Temp = 1030 'Lotus version 2.x
- PUT FileNum, , Temp
-
- Row = 0 'write this in Row 1
- DO
- CALL WriteLabel(Row, 0, 16, "This is a Label")
- CALL WriteLabel(Row, 1, 12, "So is this")
- CALL WriteInteger(Row, 2, 7, 12345)
- CALL WriteNumber(Row, 3, 9, "C2", 57.23#)
- CALL WriteNumber(Row, 4, 9, "F5", 12.3456789#)
- CALL WriteInteger(Row, 6, 9, 99) 'skip a column for fun
- Row = Row + 1 'go on to the next row
- LOOP WHILE Row < 6
-
- '----- Write the End of File record and close the file
- Temp = 1 'Opcode for End of File
- PUT FileNum, , Temp
- Temp = 0 'the data length is zero
- PUT FileNum, , Temp
- CLOSE
-
- END IF
- END
-
- SUB GetFormat (Format, Row, Column) STATIC
- GET FileNum, , CellFmt$: Format = ASC(CellFmt$)
- GET FileNum, , Column
- GET FileNum, , Row
- END SUB
-
- SUB WriteColWidth (Column, ColWidth) STATIC
-
- '----- allow a column width only once for each column
- IF NOT ColNum(Column) THEN
- Temp = 8
- PUT FileNum, , Temp
- Temp = 3
- PUT FileNum, , Temp
- PUT FileNum, , Column
- Temp$ = CHR$(ColWidth)
- PUT FileNum, , Temp$
- '----- show we wrote this column's width
- ColNum(Column) = -1
- END IF
-
- END SUB
-
- SUB WriteInteger (Row, Column, ColWidth, Integ) STATIC
-
- Temp = 13 'OpCode for an integer
- PUT FileNum, , Temp
- Temp = 7 'Length + 5 byte header
- PUT FileNum, , Temp
- Temp$ = CHR$(127) 'the format portion
- PUT FileNum, , Temp$
- PUT FileNum, , Column
- PUT FileNum, , Row
- PUT FileNum, , Integ
- CALL WriteColWidth(Column, ColWidth)
-
- END SUB
-
- SUB WriteLabel (Row, Column, ColWidth, Msg$)
-
- IF LEN(Msg$) > 240 THEN '240 is the maximum length
- Msg$ = LEFT$(Msg$, 240)
- END IF
-
- Temp = 15 'OpCode for a label
- PUT FileNum, , Temp
- Temp = LEN(Msg$) + 7 'Length plus 5-byte header
- 'plus "'" plus CHR$(0)
- PUT FileNum, , Temp
- Temp$ = CHR$(127) '127 is the default format
- PUT FileNum, , Temp$
- PUT FileNum, , Column
- PUT FileNum, , Row
- Temp$ = "'" + Msg$ + CHR$(0) 'a "'" left-aligns a label
- 'use "^" instead to center
- PUT FileNum, , Temp$
- CALL WriteColWidth(Column, ColWidth)
-
- END SUB
-
- SUB WriteNumber (Row, Col, ColWidth, Fmt$, Num#) STATIC
-
- IF LEFT$(Fmt$, 1) = "F" THEN 'fixed
- '----- specify the number of decimal places
- Format$ = CHR$(0 + VAL(RIGHT$(Fmt$, 1)))
- ELSEIF LEFT$(Fmt$, 1) = "C" THEN 'currency
- Format$ = CHR$(32 + VAL(RIGHT$(Fmt$, 1)))
- ELSEIF LEFT$(Fmt$, 1) = "P" THEN 'percent
- Format$ = CHR$(48 + VAL(RIGHT$(Fmt$, 1)))
- ELSE 'default
- Format$ = CHR$(127) 'use CHR$(255) for protected
- END IF
-
- Temp = 14 'Opcode for a number
- PUT FileNum, , Temp
- Temp = 13 'Length (8) + 5 = 13
- PUT FileNum, , Temp
-
- PUT FileNum, , Format$
- PUT FileNum, , Col
- PUT FileNum, , Row
- PUT FileNum, , Num#
-
- CALL WriteColWidth(Column, ColWidth)
-
- END SUB
-
- There are several points worth noting about this program. First, Lotus
- label strings are always terminated with a CHR$(0) zero byte, which is the
- same method used by DOS and the C language. Therefore, the WriteLabel
- subprogram adds this byte, which is also included as part of the length
- word that follows the Opcode.
- In the WriteNumber subprogram, the 1-byte format code is either 127 to
- default to unformatted, or bit-coded to indicate fixed, currency, or
- percent formatting. WriteNumber expects a format string such as "F3" which
- indicates fixed-point with three decimal positions, or "P1" for percent
- formatting using one decimal place. If you instead use "C", WriteNumber
- will use a fixed 2-decimal point currency format.
- Earlier I pointed out the extra work is needed to write a constant
- value to a binary file, because only variables may be used with PUT #.
- This is painfully clear in each of the Write subprograms, where the integer
- variable Temp is repeatedly assigned to new values. We can only hope that
- Microsoft will see fit to remove this arbitrary limitation in a later
- version of BASIC.
- Finally, note the use of the fixed-length string CellFmt$. Although
- some language support a one-byte numeric variable type, BASIC does not.
- Therefore, to read and write these values you must use a fixed-length
- string. To determine the value after reading a file you will use ASC, and
- to assign a value prior to writing it you instead use CHR$. For example,
- to assign CellFmt$ to the byte value 123 use CellFmt$ = CHR$(123).
-
-
- NAVIGATING YOUR FILES
-
- BASIC offers a number of file-related functions to determine how long a
- file is, the current DOS seek location where the next read or write will
- take place, and also if that location is at the end of the file. These are
- LOF, LOC and SEEK, and EOF respectively. LOF stands for Length Of File,
- LOC means current Location, and EOF is End Of File. The SEEK statement is
- also available to force the next file access to occur at a specified place
- within the file. All of these require a file number argument to indicate
- which file is being referred to.
-
-
- The EOF Function
-
- The EOF function is most useful when reading sequential text files, and it
- avoids BASIC's "Input past end" error that would otherwise result from
- trying to read past the end of the available data. The following short
- complete program reads a text file and displays it contents, and shows how
- EOF is used for this purpose.
-
-
- OPEN FileName$ FOR INPUT AS #1
- WHILE NOT EOF(1)
- LINE INPUT #1, This$
- PRINT This$
- WEND
- CLOSE
-
-
- Notice the use of the NOT operator in this example. The EOF function
- returns an integer value of either -1 or 0, to indicate true (at the end of
- the file) or false. Therefore, NOT -1 is equal to 0 (False), and NOT 0 is
- equal to -1 (True). This use of bit manipulation was described earlier in
- Chapter 2.
- EOF can also be used with binary and random access files for the same
- purpose. In fact, EOF may be even more useful in those cases, because
- BASIC does not create an error when you attempt to read past the end as it
- does for sequential files. Indeed, once you go past the end of a binary or
- random access file, BASIC simply fills the variables being read with zero
- bytes. Without EOF there is no way to distinguish between zeros returned
- by BASIC because you went past the end of the file and zeros that were read
- as legitimate data.
- The EOF function was originally needed with DOS 1.0 for a program to
- determine when the end of the file was reached. That version of DOS always
- wrote all data in multiples of 128 bytes, and all file directory entries
- also were listed with lengths being a multiple of 128. [That is, a file
- which contains only ten bytes of data will be reported by DIR as being 128
- bytes long.] To indicate the true end of the file, a CHR$(26) end of file
- marker was placed just past the last byte of valid data. Thus, EOF was
- originally written to search for a byte with that value, and return True
- when it was found.
- Most modern applications do not use an EOF character, and instead rely
- on the file length that is stored in the file's directory entry. However,
- some older programs still write a CHR$(26) at the end of the data, and DOS'
- COPY CON command does this as well. Therefore, BASIC's EOF will return a
- True value when this character is encountered, even if there is still more
- data to be read in the file. In fact, you can provide a minimal amount of
- data security by intentionally writing a CHR$(26) at or near the beginning
- of a sequential file. If someone then uses the DOS TYPE command to view
- the file, only what precedes the EOF marker will be displayed.
- Another implication of EOF characters in BASIC surfaces when you open
- a sequential file for append mode. BASIC makes a minimal attempt to locate
- an EOF character, and if one exists it begins appending on top of it.
- After all, if writing started just past the EOF byte, a subsequent LINE
- INPUT would fail when it reached that point. Likewise, an EOF test would
- return true and the program would stop reading at that location in the
- file. Therefore, BASIC checks the last few bytes in the file when you open
- for append, to see if an EOF marker is present. However, if the marker is
- much earlier in a large file, BASIC will not see it.
- When EOF is used with serial communications, it returns 0 until a
- CHR$(26) byte is received, at which point it continues to return -1 until
- the communications port is closed.
-
-
- The LOF Function
-
- The LOF function simply returns the current length of the file, and that
- too can be used as a way to tell when you have reached the end. In the
- random access FIELD example program shown earlier, LOF was used in
- conjunction with the record length to determine the number of records in
- the file. Since the length of most random access files is directly related
- to [and evenly divisible by] the number of records in the file, simple
- division can be used to determine how many records there are. The formula
- is NumRecords = LOF(FileNum) \ RecLength.
- Understand that when used with sequential and binary files, LOF
- returns the length of the file in bytes. But with a random access file,
- LOF instead provides the number of records.
- LOF can also be used as a crude way to see if a file exists. Even
- though this is done much more effectively and elegantly with assembly
- language or CALL Interrupt, the short example below shows how LOF can be
- used for this purpose.
-
-
- FUNCTION Exist% (FileName$) STATIC
- FileNum = FREEFILE
- OPEN FileName$ FOR BINARY AS #FileNum
- Length = LOF(FileNum)
- CLOSE #FileNum
- IF Length = 0 THEN 'it probably wasn't there
- Exist% = 0 'return False to show that
- KILL FileName$ 'and delete what we created
- ELSE
- Exist% = -1 'otherwise return True
- END IF
- END FUNCTION
-
-
- Besides being clunky, this program also has a serious flaw: If the file
- does exist but has a perfectly legal length of zero, this function will say
- it doesn't exist and then delete it! As I said, this method is crude, but
- a lot of programmers have used it.
-
-
- The LOC and SEEK Functions
-
- LOC and SEEK are closely related, in that they return information about
- where you are in the file. However, LOC reports the position of the last
- read or write, and SEEK tells where the next one will occur. As with LOF,
- LOC and SEEK return byte values for files that were opened for sequential
- or binary access, and record numbers when used with random access files.
- In practice, LOC is of little value, especially when you are
- manipulating sequential files. For reasons that only Microsoft knows, LOC
- returns the number of the last byte read or written, but *divided by 128*.
- Since no program I know of treats sequential files as containing 128-byte
- records, I cannot imagine how this could be useful. Further, since LOC
- returns the location of the *last* read or write, it never reflects the
- true position in the file.
- When used with communications, LOC reports the number of characters in
- the receive buffer that are currently waiting to be read, which is useful.
- When used with INPUT$ #, LOC provides a handy way to retrieve all of the
- characters present in the buffer at one time. This is shown in context
- below, and the example assumes that the communications port has already
- been opened.
-
-
- NumChars = LOC(1)
- IF NumChars THEN
- This$ = INPUT$(NumChars)
- END IF
-
-
- The SEEK function always returns the current file position, which is the
- point at which the next read or write will take place. One good use for
- SEEK is to read the current location in a sequential file, to allow a
- program to walk backwards through the file later. For example, if you need
- to create a text file browsing program, there is no other way to know where
- the previous line of a file is located. A short program that shows this in
- context follows in the section that describes the SEEK statement.
-
-
- The SEEK Statement
-
- Where the SEEK function lets you determine where you are currently in a
- file, the SEEK statement lets you move to any arbitrary position. As you
- might imagine, SEEK as a statement is similar to the function version in
- that it assumes a byte value when used with sequential and binary files,
- and a record number with random access files.
- SEEK can be very useful in a variety of situations, and in particular
- when indexing random access files. When an indexing system is employed,
- selected portions of a data file are loaded into memory where they can be
- searched very quickly. Since the location of the index information being
- searched corresponds to the record number of the complete data record, the
- record can be accessed with a single GET #. This was described briefly in
- the discussion of the BASIC PDS ISAM options in Chapter 5. Thus, once the
- record number for a given entry has been identified, the SEEK statement (or
- the SEEK argument in the GET # command) is used to access that particular
- record.
- For this example, though, I will instead show how SEEK can be used
- with a sequential file. The following complete program provides the
- rudiments of a text file browser, but this version displays only one line
- at a time. It would be fairly easy to expand this program to display
- entire screenfuls of text, and I leave that as an exercise for you.
- The program begins by prompting for a file name, and then opens that
- file for sequential input. The maximum number of lines that can be
- accommodated is set arbitrarily at 5000, though you will not be able to
- specify more than 16384 unless you compile with the /ah option. The long
- integer Offset&() array is used to remember where each line encountered so
- far in the file begins, and 16384 is the maximum number of elements that
- can fit into a single 64K array. For a typical text file with line lengths
- that average 60 characters, 16384 lines is nearly 1MB of text.
- When you run the program, it expects only the up and down arrow keys
- to advance and go backwards through the file, the Home key to jump to the
- beginning, or the Escape key to end the program. Notice that the words
- "blank line" are printed when a blank line is encountered, just so you can
- see that something has happened.
-
- DEFINT A-Z
- CONST MaxLines% = 5000
- REDIM Offset&(1 TO MaxLines%)
-
- CLS
- PRINT "Enter the name of file to browse: ";
- LINE INPUT "", FileName$
-
- OPEN FileName$ FOR INPUT AS #1
-
- Offset&(1) = 1 'initialize to offset 1
- CurLine = 1 'and start with line 1
-
- WHILE Action$ <> CHR$(27) 'until they press Escape
- SEEK #1, Offset&(CurLine) 'seek to the current line
- LINE INPUT #1, Text$ 'read that line
- Offset&(CurLine + 1) = SEEK(1) 'save where the next
- ' line starts
- CLS
- IF LEN(Text$) THEN 'if it's not blank
- PRINT Text$ 'print the line
- ELSE 'otherwise
- PRINT "(blank line)" 'show that it's blank
- END IF
-
- DO 'wait for a key
- Action$ = INKEY$
- LOOP UNTIL LEN(Action$)
-
- SELECT CASE ASC(RIGHT$(Action$, 1))
- CASE 71 'Home
- CurLine = 1
-
- CASE 72 'Up arrow
- IF CurLine > 1 THEN
- CurLine = CurLine - 1
- END IF
-
- CASE 80 'Down arrow
- IF (NOT EOF(1)) AND CurLine < MaxLines% THEN
- CurLine = CurLine + 1
- END IF
-
- CASE ELSE
- END SELECT
- WEND
- CLOSE
- END
-
- You should be aware that BASIC does not prevent you from using SEEK to go
- past the end of a file that has been opened for Binary access. If you do
- this and then write any data, DOS will actually extend the file to include
- the data that was just written. Therefore, it is important to understand
- that any data that lies between the previous end of the file and the newly
- added data will be undefined. When a file is deleted DOS simply abandons
- the sectors that held its data, and makes them available for later use.
- But whatever data those sectors contained remains intact. When you later
- expand a file this way using SEEK, the old abandoned sector contents are
- incorporated into the file. Even if the sectors that are allocated were
- never written to previously, they will contain the &HF6 bytes that DOS'
- FORMAT.COM uses to initialize a disk.
- You can turn this behavior into an important feature, and in some
- cases recreate a file that was accidentally truncated. If you erase a file
- by mistake, it is possible to recover it using the Norton Utilities or a
- similar disk utility program. But when an existing file is opened for
- output, DOS truncates it to a length of zero. The following program shows
- the steps necessary to reconstruct a file that has been destroyed this way.
-
-
- OPEN FileName$ FOR BINARY AS #1
- SEEK #1, 30000
- PUT #1, , X%
- CLOSE #1
-
-
- In this case, the file is restored to a length of 30000, and you can use
- larger or smaller values as appropriate. Understand that there is no
- guarantee that DOS will reassign the same sectors to the file that it
- originally used. But I have seen this trick work more than once, and it is
- at least worth a try.
- In a similar fashion, you can reduce the size of a file by seeking to
- a given location and then writing *zero* bytes there. Since BASIC provides
- no way to write zero bytes to a file, some additional trickery is needed.
- This will be described in Chapter 12 in the section that discusses using
- CALL Interrupt to access DOS and BIOS services.
-
-
- ADVANCED FILE TECHNIQUES
- ========================
-
- There are a number of clever file-related tricks that can be performed
- using only BASIC programming. Some of these tricks help you to improve on
- BASIC's speed, and others let you do things that are not possible using the
- normal and obvious methods. BASIC is no slower than other languages when
- reading and writing large amounts of data, and indeed, the bottleneck is
- frequently DOS itself. Further, if you can reduce the amount of data that
- is written, your files will be smaller as well. With that in mind, let's
- look at some ways to further improve your programs.
-
-
- SPEEDING UP FILE ACCESS
-
- The single most important way to speed up your programs is to read and
- write large amounts of data in one operation. The normal method for saving
- a numeric or TYPE array is to write each element to disk in a loop. But
- when there are many thousands of elements, a substantial amount of overhead
- is incurred just from BASIC's repeated calls to DOS. There are several
- solutions you can consider, each with increasing levels of complexity.
-
-
- BLOAD and BSAVE
-
- The simplest way to read and write a large amount of contiguous data is
- with BLOAD and BSAVE. BSAVE takes a "snapshot" of any contiguous area of
- memory up to 64K in size, and saves it to disk in a single operation. When
- an application calls DOS to read or write a file, it furnishes DOS with the
- segment and address where the data is to be loaded or saved from, and also
- the number of bytes. BLOAD and BSAVE provide a simple interface to the DOS
- read and write services, and they can be used to load and save numeric
- arrays up to 64K in size, as well as screen images.
- [I have seen a number of messages in the MSBASIC forum on CompuServe
- stating that BSAVE and BLOAD do not work with compressed disks. Many of
- those messages have come from Microsoft technical support, and I have no
- reason to doubt them. It may be that only VB/DOS has this problem, but I
- have no way to test QB and PDS because I don't use disk compression.]
- A file that has been written using BSAVE includes a 7-byte header that
- identifies it as a BSAVE file, and also shows where it was saved from and
- how many bytes it contains. BLOAD requires this header, and thus cannot be
- used with any arbitrary type of file. But when used together, these
- commands can be as much as ten times faster than a FOR/NEXT loop.
- The example below creates and then saves a single precision array, and
- then loads it again to prove the process worked.
-
-
- DEFINT A-Z
- CONST NumEls% = 20000
- REDIM Array(1 TO NumEls%) 'create the array
-
- FOR X = 1 TO NumEls% 'file it with values
- Array(X) = X
- NEXT
-
- DEF SEG = VARSEG(Array(1)) 'set the BSAVE segment
- BSAVE "ARRAY.DAT", VARPTR(Array(1)), NumEls% * LEN(Array(1))
-
- REDIM Array(1 TO NumEls%) 'recreate the array
- DEF SEG = VARSEG(Array(1)) 'the array may have moved
- BLOAD "ARRAY.DAT", VARPTR(Array(1))
-
- FOR X = 1 TO NumEls% 'prove the data is valid
- IF Array(X) <> X THEN
- PRINT "Error in element"; X
- END IF
- NEXT
- END
-
-
- Because BSAVE and BLOAD use the current DEF SEG setting to know the segment
- the data is in, VARSEG is used with the first element of the array. Once
- the correct segment has been established, BSAVE is given the name of the
- file to save, the starting address, and the number of bytes of data. As
- with the TYPE variable example shown earlier, LEN is ideal here as well to
- help calculate the number of bytes that must be saved. In this case, each
- integer array element is two bytes long, and BASIC multiplies the constants
- NumEls% and LEN(Array(1)) when the program is compiled. Therefore, no
- additional code is added to the program to calculate this value at runtime.
- Once the array has been saved it is redimensioned, which effectively
- clears it to all zero values prior to reloading. Notice that DEF SEG is
- used again before the BLOAD statement. This is an important point, because
- there is no guarantee that BASIC will necessarily allocate the same block
- of memory the second time. If a file is loaded into the wrong area of
- memory, your program is sure to crash or at least not work correctly.
- Also note that BLOAD always loads the entire file, and a length
- argument is not needed or expected. This brings up an important issue: how
- can you determine how large to dimension an array prior to loading it? The
- answer, as you may have surmised, is to open the file for binary access and
- read the length stored in the BSAVE header. All that's needed is to know
- how the header is organized, as the following program reveals.
-
- DEFINT A-Z
- TYPE BHeader
- Header AS STRING * 1
- Segment AS INTEGER
- Address AS INTEGER
- Length AS INTEGER
- END TYPE
- DIM BLHeader AS BHeader
-
- OPEN "ARRAY.DAT" FOR BINARY AS #1
- GET #1, , BLHeader
- CLOSE
-
- IF ASC(BLHeader.Header) <> &HFD THEN
- PRINT "Not a valid BSAVE file"
- END
- END IF
-
- LongLength& = BLHeader.Length
- IF LongLength& < 0 THEN
- LongLength& = LongLength& + 65536
- END IF
-
- NumElements = LongLength& \ 2
- REDIM Array(1 TO NumElements)
-
- DEF SEG = VARSEG(Array(1))
- BLOAD "ARRAY.DAT", VARPTR(Array(1))
- END
-
- Even though the original segment and address from which the file was saved
- is in the BSAVE header, that information is not used here. In most
- situations you will always provide BLOAD with an address to load the file
- to. However, if the address is omitted, BASIC uses the segment and address
- stored in the file, and ignores the current DEF SEG setting. This would be
- useful when handling text and graphics images which are always loaded to
- the same segment from which they were originally saved. But in general I
- recommend that you always define an explicit segment and address.
- There are a few other points worth elaborating on as well. First, the
- program examines the first byte in the file to be sure it is the special
- value &HFD which identifies a BSAVE file. The ASC function is required for
- that, since the only way to define a TYPE component one byte long is as a
- string.
- Second, the length is stored as an unsigned integer, which cannot be
- manipulated directly in a BASIC program if its value exceeds 32767. As you
- learned in Chapter 2, integer values larger than 32767 are treated by BASIC
- as signed, and in this case they are considered negative. Therefore, the
- value is first assigned to a long integer, which is then tested for a value
- less than zero. If it is indeed negative, 65536 is added to the variable
- to convert it to an equivalent positive number. Note that the length in a
- BSAVE header does not include the header length; only the data itself is
- considered.
- If you single-step through this program after running the earlier one
- that created the file, you will see that the code that adds 65536 is
- executed, because the header shows that the file contains 40000 bytes.
- There are two limitations to using BSAVE and BLOAD this way. One
- problem is that you may not want the header to be attached to the file.
- The other, more important problem is that BASIC allows arrays to exceed
- 64K. Saving a single huge array in multiple files is clumsy, and
- contributes to the clutter on your disks. The header issue is less
- important, because you can always access the file with normal binary
- statements after using a SEEK to skip over the header. But the huge array
- problem requires some heavy ammunition.
- One final point worth mentioning is that BSAVE and BLOAD assume a .BAS
- file name extension if none is given. This is incredibly stupid, since the
- contents of a BSAVE file have no relationship to a BASIC source file.
- Therefore, to save a file with no extension at all you must append a period
- to the name: BSAVE "MYFILE.", Address, Length.
-
-
- Beyond BSAVE
-
- The program that follows includes both a demonstration and a pair of
- subprograms that let you save any data regardless of its size or location.
- These routines are primarily intended for saving huge numeric and TYPE
- arrays, but there is no reason they couldn't be used for other purposes.
- However, they cannot be used with conventional variable-length string
- arrays, because the data in those arrays is not contiguous. The file is
- processed in 16K blocks using multiple passes, and the actual saving and
- loading is performed by calling BASIC's internal PUT # and GET # routines.
-
- DEFINT A-Z
- 'NOTE: This program must be compiled with the /ah option.
-
- DECLARE SUB BigLoad (FileName$, Segment, Address, Bytes&)
- DECLARE SUB BigSave (FileName$, Segment, Address, Bytes&)
- DECLARE SUB BCGet ALIAS "B$GET3" (BYVAL FileNum, BYVAL Segment, _
- BYVAL Address, BYVAL NumBytes)
- DECLARE SUB BCPut ALIAS "B$PUT3" (BYVAL FileNum, BYVAL Segment, _
- BYVAL Address, BYVAL NumBytes)
-
- CONST NumEls% = 20000
- REDIM Array&(1 TO NumEls%)
- NumBytes& = LEN(Array&(1)) * CLNG(NumEls%)
-
- FOR X = 1 TO NumEls% 'fill the array
- Array&(X) = X
- NEXT
-
- Segment = VARSEG(Array&(1)) 'save the array
- Address = VARPTR(Array&(1))
- CALL BigSave("ARRAY.DAT", Segment, Address, NumBytes&)
-
- REDIM Array&(1 TO NumEls%) 'clear the array
-
- Segment = VARSEG(Array&(1)) 'reload the array
- Address = VARPTR(Array&(1))
- CALL BigLoad("ARRAY.DAT", Segment, Address, NumBytes&)
-
- FOR X = 1 TO NumEls% 'prove this all worked
- IF Array&(X) <> X THEN
- PRINT "Error in element"; X
- END IF
- NEXT
- END
-
- SUB BigLoad (FileName$, DataSeg, Address, Bytes&) STATIC
-
- FileNum = FREEFILE
- OPEN FileName$ FOR BINARY AS #FileNum
- NumBytes& = Bytes& 'work with copies to
- Segment = DataSeg 'protect the parameters
-
- DO
- IF NumBytes& > 16384 THEN
- CurrentBytes = 16384
- ELSE
- CurrentBytes = NumBytes&
- END IF
- CALL BCGet(FileNum, Segment, Address, CurrentBytes)
- NumBytes& = NumBytes& - CurrentBytes
- Segment = Segment + &H400
- LOOP WHILE NumBytes&
-
- CLOSE #FileNum
-
- END SUB
-
- SUB BigSave (FileName$, DataSeg, Address, Bytes&) STATIC
-
- FileNum = FREEFILE
- OPEN FileName$ FOR BINARY AS #FileNum
- NumBytes& = Bytes& 'work with copies to
- Segment = DataSeg 'protect the parameters
-
- DO
- IF NumBytes& > 16384 THEN
- CurrentBytes = 16384
- ELSE
- CurrentBytes = NumBytes&
- END IF
- CALL BCPut(FileNum, Segment, Address, CurrentBytes)
- NumBytes& = NumBytes& - CurrentBytes
- Segment = Segment + &H400
- LOOP WHILE NumBytes&
-
- CLOSE #FileNum
-
- END SUB
-
- Although BASIC lets you save and load only single variables or array
- elements, its internal library routines can work with data of nearly any
- size. And since TYPE variables can be as large as 64K, these routines must
- be able to accommodate data at least that big. Therefore, BASIC's usual
- restriction on what you can and cannot read or write to disk with GET # and
- PUT # is an arbitrary one.
- Accessing BASIC's internal routines requires that you declare them
- using ALIAS, since it is illegal to call a routine that has a dollar sign
- in its name. As you can see, these routines expect their parameters to be
- passed by value, and this is handled by the DECLARE statements. Normally,
- you cannot call these routines from within the QB editing environment. But
- if you separate the two subprograms and place them into a different module,
- that module can be compiled and added to a Quick Library. That is, the
- subprograms can be together in one file, but not with the demo that calls
- them. Be sure to add the two DECLARE statements that define B$PUT3 and
- B$GET3 to that module as well.
- The long integer array this program creates exceeds the normal 64K
- limit, so the /ah compiler switch must be used. Notice in the BigLoad and
- BigSave subprograms that copies are made of two of the incoming parameters.
- If this were not done, the subprograms would change the passed values,
- which is a bad practice in this case. Also, notice how the segment value
- that is used for saving and loading is adjusted through each pass of the DO
- loop. Since the data is saved in 16K blocks, the segment must be increased
- by 16384 \ 16 = 1024 for each pass. The use of an equivalent &H value here
- is arbitrary; I translated this program from another version written in
- assembly language that used Hex for that number.
-
-
- Processing Large Files
-
- Although the solutions shown so far are valuable when saving or loading
- large amounts of data, that is as far as they go. In many cases you will
- also need to process an entire existing file. Some examples are a program
- that copies or encrypts files, or a routine that searches an entire file
- for a string of text. As with saving and loading files, processing a file
- or portion of a file in large blocks is always faster and more effective
- than processing it line by line.
- The file copying subprogram below accepts source and destination file
- names, and copies the data in 4K blocks. The 4K size is significant,
- because it is large enough to avoid many repeated calls to DOS, and small
- enough to allow a conventional string to be used as a file buffer. As with
- the BigLoad and BigSave routines, the file is processed in pieces. Also,
- for simplicity a complete file name and path is required. Although the DOS
- COPY command lets you use a source file name and a destination drive or
- path only, the CopyFile subprogram requires that entire file names be given
- for both.
-
- DEFINT A-Z
- DECLARE SUB CopyFile (InFile$, OutFile$)
-
- SUB CopyFile (InFile$, OutFile$) STATIC
-
- File1 = FREEFILE
- OPEN InFile$ FOR BINARY AS #File1
-
- File2 = FREEFILE
- OPEN OutFile$ FOR BINARY AS #File2
-
- Remaining& = LOF(File1)
- DO
- IF Remaining& > 4096 THEN
- ThisPass = 4096
- ELSE
- ThisPass = Remaining&
- END IF
- Buffer$ = SPACE$(ThisPass)
- GET #File1, , Buffer$
- PUT #File2, , Buffer$
- Remaining& = Remaining& - ThisPass
- LOOP WHILE Remaining&
-
- CLOSE File1, File2
-
- END SUB
-
- Once the basic structure of a routine that processes an entire file has
- been established, it can be easily modified for other purposes. For
- example, CopyFile can be altered to encrypt an entire file, search a file
- for a text string, and so forth. A few of these will be shown here. Note
- that for simplicity and clarity, CopyFile creates a new buffer with each
- pass through the loop. You could avoid that by preceding the assignment
- with IF LEN(Buffer$) <> ThisPass THEN or similar logic, to avoid creating
- the buffer when it already exists and is the correct length.
- The BufIn function and example below serves as a very fast LINE INPUT
- replacement. Even though BASIC's own file input routines provide buffering
- for increased speed, they are not as effective as this function. In my
- measurements I have found BufIn to be consistently four to five times
- faster than BASIC's LINE INPUT routine when reading large (greater than
- 50K) files. With smaller files the improvement is less, but still
- substantial.
-
- DEFINT A-Z
- DECLARE FUNCTION BufIn$ (FileName$, Done)
-
- LINE INPUT "Enter a file name: ", FileName$
-
- '---- Show how fast BufIn$ reads the file.
- Start! = TIMER
- DO
- This$ = BufIn$(FileName$, Done)
- IF Done THEN EXIT DO
- LOOP
- Done! = TIMER
- PRINT "Buffered input: "; Done! - Start!
-
-
- '---- Now show how long BASIC's LINE INPUT takes.
- Start! = TIMER
- OPEN FileName$ FOR INPUT AS #1
- DO
- LINE INPUT #1, This$
- LOOP UNTIL EOF(1)
- Done! = TIMER
- PRINT " BASIC's INPUT: "; Done! - Start!
- CLOSE
- END
-
- FUNCTION BufIn$ (FileName$, Done) STATIC
-
- IF Reading GOTO Process 'now reading, jump in
-
- '----- initialization
- Reading = -1 'not reading so start now
- Done = 0 'clear Done just in case
- CR$ = CHR$(13) 'define for speed later
-
- FileNum = FREEFILE 'open the file
- OPEN FileName$ FOR BINARY AS #FileNum
-
- Remaining& = LOF(FileNum) 'byte count to be read
- IF Remaining& = 0 GOTO ExitFn 'empty or nonexistent file
-
- BufSize = 4096 'bytes to read each pass
- Buffer$ = SPACE$(BufSize) 'assume BufSize bytes
-
- DO 'the main outer loop
- IF Remaining& < BufSize THEN 'read only what remains
- BufSize = Remaining& 'resize the buffer
- IF BufSize < 1 GOTO ExitFn 'possible only if EOF byte
- Buffer$ = SPACE$(BufSize) 'create the file buffer
- END IF
- GET #FileNum, , Buffer$ 'read a block
-
- BufPos = 1 'start at the beginning
- DO 'walk through buffer
- CR = INSTR(BufPos, Buffer$, CR$) 'look for a Return
- IF CR THEN 'we found one
- SaveCR = CR 'save where
- BufIn$ = MID$(Buffer$, BufPos, CR - BufPos)
- BufPos = CR + 2 'skip inevitable LF
- EXIT FUNCTION 'all done for now
- ELSE 'back up in the file
- '---- if at the end and no CHR$(13) was found
- ' return what remains in the string
- IF SEEK(FileNum) >= LOF(FileNum) THEN
- Output$ = MID$(Buffer$, SaveCR + 2)
- '---- trap a trailing EOF marker
- IF RIGHT$(Output$, 1) = CHR$(26) THEN
- Output$ = LEFT$(Output$, LEN(Output$) - 1)
- END IF
- BufIn$ = Output$ 'assign the function
- GOTO ExitFn 'and exit now
- END IF
- Slop = BufSize - SaveCR - 1 'calc buffer excess
- Remaining& = Remaining& + Slop 'calc file excess
- SEEK #FileNum, SEEK(FileNum) - Slop
- END IF
-
- Process:
- LOOP WHILE CR 'while more in buffer
- Remaining& = Remaining& - BufSize
-
- LOOP WHILE Remaining& 'while more in the file
-
- ExitFn:
- Reading = 0 'we're not reading anymore
- Done = -1 'show that we're all done
- CLOSE #FileNum 'final clean-up
-
- END FUNCTION
-
- As you can see, the BufIn function opens the file, reads each line of text,
- and then closes the file and sets a flags when it has exhausted the text.
- Even though this example show BufIn being invoked in a DO loop, it can be
- used in any situation where LINE INPUT would normally be used. As long as
- you declare the function, it may be added to programs of your own and used
- when sequential line-oriented data must be read as quickly as possible.
- I don't think each statement in the BufIn function warrants a complete
- explanation, but some of the less obvious aspects do. BufIn operates by
- reading the file in 4K blocks in an outer loop, and each block is then
- examined for a CHR$(13) line terminator in an inner loop that uses INSTR.
- INSTR happens to be extremely fast, and it is ideal when used this way to
- search a string for a single character.
- The only real complication is when a portion of a string is in the
- buffer, because that requires seeking backwards in the file to the start of
- the string. Other, less important complications that also must be handled
- arise from the presence of a CHR$(26) EOF marker, and a final string that
- has no terminating carriage return.
- I have made every effort to make this function as bullet-proof as
- possible; however, it is mandatory that every carriage return in the file
- be followed by a corresponding line feed. Some word processors eliminate
- the line feed to indicate a "soft return" at the end of a line, as opposed
- to the "hard return" that signifies the end of a paragraph. Most word
- processor files use a non-standard format anyway, so that should not be
- much of a problem.
- The last complete program I'll present here is called TEXTFIND.BAS,
- and it searches a group of files for a specified string. TEXTFIND is
- particularly useful when you need to find a document, and cannot remember
- its name. If you can think of a snippet of text the file might contain,
- TEXTFIND will identify which files contain that text, and then display it
- in context.
-
- '----- TEXTFIND.BAS
-
- 'Copyright (c) 1991 by Ethan Winer
-
- DEFINT A-Z
-
- TYPE RegTypeX 'used by CALL Interrupt
- AX AS INTEGER
- BX AS INTEGER
- CX AS INTEGER
- DX AS INTEGER
- BP AS INTEGER
- SI AS INTEGER
- DI AS INTEGER
- Flags AS INTEGER
- DS AS INTEGER
- ES AS INTEGER
- END TYPE
- DIM Registers AS RegTypeX 'holds the CPU registers
-
- TYPE DTA 'used by DOS services
- Reserved AS STRING * 21 'reserved for use by DOS
- Attribute AS STRING * 1 'the file's attribute
- FileTime AS STRING * 2 'the file's time
- FileDate AS STRING * 2 'the file's date
- FileSize AS LONG 'the file's size
- FileName AS STRING * 13 'the file's name
- END TYPE
- DIM DTAData AS DTA
-
- DECLARE SUB InterruptX (IntNumber, InRegs AS RegTypeX, OutRegs AS RegTypeX)
-
- CONST MaxFiles% = 1000
- CONST BufMax% = 4096
-
- REDIM Array$(1 TO MaxFiles%) 'holds the file names
- Zero$ = CHR$(0) 'do this once for speed
-
- '----- This function returns the larger of two integers.
- DEF FNMax% (Value1, Value2)
- FNMax% = Value1
- IF Value2 > Value1 THEN FNMax% = Value2
- END DEF
-
- '----- This function loads a group of file names.
- DEF FNLoadNames%
-
- STATIC Count
-
- '---- define a new Data Transfer Area for DOS
- Registers.DX = VARPTR(DTAData)
- Registers.DS = VARSEG(DTAData)
- Registers.AX = &H1A00
- CALL InterruptX(&H21, Registers, Registers)
-
- Count = 0 'zero the file counter
- Spec$ = Spec$ + Zero$ 'DOS needs an ASCIIZ string
- Registers.DX = SADD(Spec$) 'show where the spec is
- Registers.DS = SSEG(Spec$) 'use this with PDS
- 'Registers.DS = VARSEG(Spec$) 'use this with QB
- Registers.CX = 39 'the attribute for any file
- Registers.AX = &H4E00 'find file name service
-
- '---- Read the file names that match the search specification. The Flags
- ' registers indicates when no more matching files are found. Copy
- ' each file name to the string array. Service &H4F is used to
- ' continue the search started with service &H4E using the same file
- ' specification.
- DO
- CALL InterruptX(&H21, Registers, Registers)
- IF Registers.Flags AND 1 THEN EXIT DO
- Count = Count + 1
- Array$(Count) = DTAData.FileName
- Registers.AX = &H4F00
- LOOP WHILE Count < MaxFiles%
-
- FNLoadNames% = Count 'return the number of files
-
- END DEF
-
- '----- The main body of the program begins here.
- PRINT "TEXTFIND Copyright (c) 1991, Ziff-Davis Press."
- PRINT
-
- '---- Get the file specification, or prompt for one if it wasn't given.
- Spec$ = COMMAND$
- IF LEN(Spec$) = 0 THEN
- PRINT "Enter a file specification: ";
- INPUT "", Spec$
- END IF
-
- '----- Ask for the search string to find.
- PRINT " Enter the text to find: ";
- INPUT Find$
- PRINT
-
- Find$ = UCASE$(Find$) 'ignore capitalization
- FindLength = LEN(Find$) 'see how long Find$ is
- IF FindLength = 0 THEN END
-
- Count = FNLoadNames% 'load the file names
- IF Count = 0 THEN
- PRINT "No matching files"
- END
- END IF
-
- '----- Isolate the drive and path if given.
- FOR X = LEN(Spec$) TO 1 STEP -1
- Char = ASC(MID$(Spec$, X))
- IF Char = 58 OR Char = 92 THEN '":" or "\"
- Path$ = LEFT$(UCASE$(Spec$), X)
- EXIT FOR
- END IF
- NEXT
-
- FOR X = 1 TO Count 'for each matching file
- Array$(X) = LEFT$(Array$(X), INSTR(Array$(X), Zero$) - 1)
- PRINT "Reading "; Path$; Array$(X)
- OPEN Path$ + Array$(X) FOR BINARY AS #1
- Length& = LOF(1) 'get and save its length
- IF Length& < FindLength GOTO NextFile
-
- BufSize = BufMax% 'assume a 4K text buffer
- IF BufSize > Length& THEN BufSize = Length&
- Buffer$ = SPACE$(BufSize) 'create the file buffer
-
- LastSeek& = 1 'seed the SEEK location
- BaseAddr& = 1 'and the starting offset
- Bytes = 0 'how many bytes to search
-
- DO 'the file read loop
- BaseAddr& = BaseAddr& + Bytes 'track block start
- IF Length& - LastSeek& + 1 >= BufSize THEN
- Bytes = BufSize 'at least BufSize bytes left
- ELSE 'get just what remains
- Bytes = Length& - LastSeek& + 1
- Buffer$ = SPACE$(Bytes) 'adjust the buffer size
- END IF
-
- SEEK #1, LastSeek& 'seek back in the file
- GET #1, , Buffer$ 'read a chunk of the file
-
- Start = 1 'this is the INSTR loop for
- DO 'searching within the buffer
- Found = INSTR(Start, UCASE$(Buffer$), Find$)
- IF Found THEN 'print it in context
- Start = Found + 1 'to resume using INSTR later
- PRINT 'add a blank line for clarity
- PRINT MID$(Buffer$, FNMax%(1, Found - 20), FindLength + 40)
- PRINT
-
- PRINT "Continue searching "; Array$(X);
- PRINT "? (Yes/No/Skip): ";
- WHILE INKEY$ <> "": WEND 'clear kbd buffer
- DO
- KeyHit$ = UCASE$(INKEY$) 'then get a response
- LOOP UNTIL KeyHit$ = "Y" OR KeyHit$ = "N" OR KeyHit$ = "S"
- PRINT KeyHit$ 'echo the letter
- PRINT
-
- IF KeyHit$ = "N" THEN '"No"
- END 'end the program
- ELSEIF KeyHit$ = "S" THEN '"Skip"
- GOTO NextFile 'go to the next file
- END IF
-
- END IF
- 'search for multiple hits
- LOOP WHILE Found 'within the file buffer
-
- IF Bytes = BufSize THEN 'still more file to examine
- '---- Back up a bit in case Find$ is there but straddling the buffer
- ' boundary. Then update the internal SEEK pointer.
- BaseAddr& = BaseAddr& - FindLength
- LastSeek& = BaseAddr& + Bytes
- END IF
-
- LOOP WHILE Bytes = BufSize AND BufSize = BufMax%
-
- NextFile:
- CLOSE #1
- Buffer$ = "" 'clear the buffer for later
-
- NEXT
- END
-
- TEXTFIND may be run either in the BASIC editor or compiled to an executable
- file and then run. If you are using QuickBASIC you will need either QB.QLB
- or QB.LIB because the program relies on CALL Interrupt to interface with
- DOS. To start QB and load the QB.QLB library simply enter qb /l. If you
- are compiling the program, specify the QB.LIB file when it is linked:
-
- link textfind , , nul , qb;
-
- For BASIC 7 users the appropriate library names are QBX.QLB and QBX.LIB
- respectively. [And for VB/DOS the libraries are VBDOS.QLB and VBDOS.LIB.]
- When you run TEXTFIND you may either enter a file specification such
- as *.BAS or LET*.TXT or the like as a command line argument, or enter
- nothing and let the program prompt you. In either case, you will then be
- asked to enter the text string you're searching for. TEXTFIND will search
- through every file that matches the file specification, and display the
- string in context if it is found.
- As written, TEXTFIND shows the 20 characters before and after the
- string. You may of course modify that to any reasonable number of
- characters. Simple change the 20 and 40 values in the corresponding PRINT
- statement. The first value is the number of characters on either side to
- display, and the second must be twice that to accommodate the length of the
- search string itself. Note the use of FNMax% which ensures that the
- program will not try to print characters before the start of the buffer.
- If the text were found at the very start of the file, attempting to print
- the 20 characters that precede it will create an "Illegal function call"
- error at the MID$ function.
- Each time the string is found and displayed you are offered the
- opportunity to continue searching the same file, ending the program, or
- skipping to the next file.
- Although CALL Interrupt will be discussed in depth in Chapter 12,
- there are several aspects of the program's operation that require
- elaboration here. First, any program that uses the DOS Find First and Find
- Next services to read a list of file names must establish a small block of
- memory as a Disk Transfer Area (DTA). The DTA holds pertinent information
- about each file that is found, such as its date, time, size, and attribute.
- In this case, though, we are merely interested in each file's name. DOS
- service &H1A is used to assign the DTA to a TYPE variable that is designed
- to facilitate extracting this information. BASIC PDS [and VB/DOS] include
- the DIR$ function which lets you read file names, but I have used CALL
- Interrupt here so the program will also work with QuickBASIC.
- Second, DEF FN-style functions are used instead of formal functions
- because they are smaller and slightly faster. The FNLoadNames function is
- responsible for loading all of the file names into the string array, and it
- returns the number of files that were found. After each call to DOS to
- find the next matching name, the Carry flag is tested. DOS often uses the
- carry flag to indicate the success or failure of an operation, and in this
- case it is set to True when there are no more files.
- Note how a CHR$(0) is appended to the file specification when calling
- DOS, to indicate the end of the string. Similarly, DOS returns each file
- name terminated with a zero byte, and INSTR is used to find that byte.
- Then, only those characters to the left of the zero are kept using LEFT$.
- Third, the block of code that isolates the drive and path name if
- given is needed because the DOS Find services return only a file name. If
- you enter D:\ANYDIR\*.* as a file specification, that is then passed to
- DOS. But DOS returns only the names it finds that match the specification.
- Therefore, the drive and path must be added to the beginning of each name,
- to create a complete file name for the subsequent OPEN command.
- Finally, as with the BufIn function, the files are read in 4K (4096-
- byte) blocks, except for the last block which of course may be smaller. A
- smaller block is also used when the file is less than 4K in length. Within
- each outer read loop, an inner loop is employed to search for the text, and
- again INSTR is used because of its speed. As written, TEXTFIND looks for
- the specified string without regard to capitalization. You can remove that
- feature by eliminating the UCASE$ function in both the INSTR loop, and at
- the point in the program where Find$ is capitalized.
-
-
- MINIMIZING DISK USAGE
-
- While improving your program's performance is certainly a desireable
- pursuit, equally important is minimizing the amount of space needed to
- store data. Besides the obvious savings in disk space, the less data there
- is, the faster it can be loaded and saved. There are a number of simple
- tricks you can use to reduce the size of your data files, and some types of
- data lend themselves quite nicely to compaction techniques.
- Date information is particularly easy to reduce. At the minimum, you
- should remove the separating slashes or dashes--perhaps with a dedicated
- function. For example, you would convert "06-22-91" to "062291". Even
- better, however, is to convert each digit pair to an equivalent single
- CHR$() byte, and also swap the order of the digits. That is, the date
- above would be packed to CHR$(91) + CHR$(6) + CHR$(22). By placing the
- year first followed by the month and then the day, dates may also be
- compared. Otherwise, a normal string comparison would show the date "01-
- 01-91" as being less (earlier) than "12-31-90" even though it is in fact
- greater (later). A complementary function would then extract the ASCII
- values into a date string suitable for display. These are shown below.
-
- DEFINT A-Z
- DECLARE FUNCTION PackDate$ (D$)
- DECLARE FUNCTION UnPackDate$ (D$)
-
- D$ = "03-22-91"
- Packed$ = PackDate$(D$)
- UnPacked$ = UnPackDate$(Packed$)
-
- PRINT D$
- PRINT Packed$
- PRINT UnPacked$
- END
-
- FUNCTION PackDate$ (D$) STATIC
- Year = VAL(RIGHT$(D$, 2))
- Month = VAL(LEFT$(D$, 2))
- Day = VAL(MID$(D$, 4, 2))
- PackDate$ = CHR$(Year) + CHR$(Month) + CHR$(Day)
- END FUNCTION
-
- FUNCTION UnPackDate$ (D$) STATIC
- Month$ = LTRIM$(STR$(ASC(MID$(D$, 2, 1))))
- Day$ = LTRIM$(STR$(ASC(RIGHT$(D$, 1))))
- Year$ = LTRIM$(STR$(ASC(LEFT$(D$, 1))))
- UnPackDate$ = RIGHT$("0" + Month$, 2) + "-" + RIGHT$("0" + Day$, 2) + _
- "-" + RIGHT$("0" + Year$, 2)
- END FUNCTION
-
- Because the compacted dates will likely contain a CHR$(26) byte which is
- used by DOS and BASIC as an EOF marker, this method is useful only with
- random access and binary data files. But since it is usually large
- database files that need the most help anyway, these functions are ideal.
- Another useful database compaction technique is to replace selected
- strings with an equivalent integer or byte value. The commercial database
- program *DataEase* uses a very clever trick to implement multiple choice
- fields. It is not uncommon to have a string field that contains, say, an
- income or expense category. For example, most businesses are required to
- indicate the purpose of each check that is written. Instead of using a
- string field and requiring the operator to type Entertainment, Payroll, or
- whatever, a menu can be popped up showing a list of possible choices.
- Assuming there are no more than 256 possibilities, the choice number
- that was entered can be stored on disk in a single byte. You would use
- something like FileType.Choice = CHR$(MenuChoice), where the Choice portion
- of the file type was defined as STRING * 1. Then to extract the choice
- after a record was read you would use MenuChoice = ASC(FileType.Choice).
- Some database programs support Memo Fields, whereby the user can enter
- a varying amount of memo information. Since database files almost always
- use a fixed length for each record, this presents a programming dilemma:
- How much space do you set aside for the memo field? If you set aside too
- little, the user won't be very pleased. But setting aside enough to
- accommodate the longest possible string is very wasteful of disk space.
- One good solution is to store a long integer pointer in each record,
- and keep the memos themselves in a separate file. A long integer requires
- only four bytes of storage, yet it can hold a seek location for memo data
- kept in a separate file whose size can be greater than 2000 MB! As each
- new memo is entered, the current length [derived using LOF] of the memo
- file is written in the current record of the data file. The memo string is
- then appended to the memo file. When you want to retrieve the memo, simply
- seek to the long integer offset held in the main data record and use LINE
- INPUT to read the string from the memo file.
- The only real complication with this method is when a memo field must
- be edited. There's no reasonable way to lengthen or shorten data in the
- middle of a file, and no reasonable program would even try. Instead, you
- would simply overwrite the existing data with special values--perhaps with
- CHR$(255) bytes--and then append the new memo to the end of the file.
- Periodically you would have to run a utility program that copied only the
- valid memo fields to a new file, and then delete the old file. Be aware
- that you will also have to update the long integer pointers in the main
- data file, to reflect the new offsets of their corresponding memo fields.
- The last data size reduction technique is probably the simplest of
- all, and that is to use the appropriate type of data and file access
- method. If you can get by with a single precision variable, don't use a
- double precision. And if the range of integer values is sufficient, use
- those. Many programmers automatically use single precision variables
- without even thinking about it, when a smaller data type would suffice.
- Finally, avoid using sequential files to store numeric data. As I
- already pointed out, an integer can be stored in a binary file in only two
- bytes--no matter what its value--compared to as many as eight bytes needed
- to store the equivalent digits, possible minus sign, and a terminating
- carriage return and line feed. Be creative, and don't be afraid to invent
- a method that is suited to your particular application. The Lotus format
- is a good one for many other applications, whereby a size and type code
- precedes each piece of information. If your needs are modest you can
- probably get away with a single byte as a type code, further reducing the
- amount of storage that is needed.
-
-
- AVOIDING BASIC'S LIMITATIONS
-
- So far I have focused on improving what BASIC already does. I showed
- techniques for speeding up file accesses, and reducing the size of your
- data. I even showed how to overcome BASIC's unwillingness to directly
- write binary data larger than a single variable. But there are other BASIC
- limitations that can be overcome as well.
- One important limitation is that BASIC lets you run only .EXE files
- with the RUN statement. If you need to execute a .COM program or a batch
- file, BASIC will not let you. However you can trick DOS into believing a
- .COM program or batch file's name was entered at the DOS prompt. The
- StuffBuffer subprogram shown below inserts a string of up to 15 characters
- directly into the keyboard buffer. It works by poking each character one
- by one into the buffer address in low memory. Thus, when your program ends
- the characters are there as if someone had typed them manually.
-
- DEFINT A-Z
- DECLARE SUB StuffBuffer (Cmd$)
-
- SUB StuffBuffer (Cmd$) STATIC
-
- '----- Limit the string to 14 characters plus Enter and save the length.
- Work$ = LEFT$(Cmd$, 14) + CHR$(13)
- Length = LEN(Work$)
-
- '----- Set the segment for poking, define the buffer head and tail, and
- ' then poke each character.
- DEF SEG = 0
- POKE 1050, 30
- POKE 1052, 30 + Length * 2
- FOR X = 1 TO Length
- POKE 1052 + X * 2, ASC(MID$(Work$, X))
- NEXT
-
- END SUB
-
- To run a .COM program or batch file simply call StuffBuffer and end the
- program:
-
- CALL StuffBuffer("PROGRAM"): END
-
- A terminating carriage return is added to the command, to include a final
- Enter keypress. Because the keyboard buffer holds only 15 characters, you
- cannot specify long path names when using StuffBuffer. However, you can
- easily open and write a short batch file with the complete path and file
- name, and run the batch file instead.
- Notice that this technique will not work if the original BASIC program
- itself has been run from a batch file, because that batch file gains
- control when the program ends. Also, when creating and running a batch
- file that will be run by StuffBuffer, it is imperative that the last line
- *not* have a terminating carriage return. The short example below shows
- the correct way to create and run a batch file for use with StuffBuffer.
-
-
- OPEN "MYBAT.BAT" FOR OUTPUT AS #1
- PRINT #1, "cd \somedir"
- PRINT #1, "someprog";
- CLOSE
- CALL StuffBuffer("MYBAT")
- END
-
-
- You can also have the batch file re-run the BASIC program by entering its
- name as the last line in the batch file. In that case you would include
- the semicolon at the end of that line, instead of the line that runs the
- program. Note that StuffBuffer is an ideal replacement for BASIC's SHELL
- command, because with SHELL your BASIC program remains in memory while the
- subsequent program is run. Using StuffBuffer with a batch file removes the
- BASIC program entirely, thus freeing up all available system memory for the
- program being run.
- Understand that StuffBuffer cannot be used to activate a TSR or other
- program that monitors keyboard interrupt 9. This limitation also extends
- to the special key sequences that enable the Turbo mode on some PC
- compatibles, and simulating Ctrl-Esc to activate the DOS compatibility box
- of OS/2. Programs that look for these special keys insert themselves into
- the keyboard chain *before* the keyboard buffer, and act on them before the
- BIOS has the chance to store them in the buffer.
- Another BASIC limitation is that only 15 files may be open at one
- time. In truth, this is really a DOS limitation, and indeed, the fix
- requires a DOS interrupt service. It is also possible to reduce the number
- of files open at once by combining data. For example, the BASIC PDS ISAM
- file manager uses this technique to store both the data and its indexes all
- in the same file. But doing that requires more complication than many
- programmers are willing to put up with.
- The program below shows how to increase the number of files that DOS
- will let you open. Be aware that the DOS service that performs this magic
- requires at least version 3.3, and this program tests for that.
-
- DEFINT A-Z
- DECLARE SUB Interrupt (IntNum, InRegs AS ANY, OutRegs AS ANY)
- DECLARE SUB MoreFiles (NumFiles)
- DECLARE FUNCTION DOSVer% ()
-
- TYPE RegType
- AX AS INTEGER
- BX AS INTEGER
- CX AS INTEGER
- DX AS INTEGER
- BP AS INTEGER
- SI AS INTEGER
- DI AS INTEGER
- Flags AS INTEGER
- END TYPE
- DIM SHARED InRegs AS RegType, OutRegs AS RegType
-
- ComSpec$ = ENVIRON$("COMSPEC")
- BootDrive$ = LEFT$(ComSpec$, 2)
- OPEN BootDrive$ + "\CONFIG.SYS" FOR INPUT AS #1
- DO WHILE NOT EOF(1)
- LINE INPUT #1, Work$
- Work$ = UCASE$(Work$)
- IF LEFT$(Work$, 6) = "FILES=" THEN
- FilesVal = VAL(MID$(Work$, 7))
- EXIT DO
- END IF
- LOOP
- CLOSE
-
- INPUT "How many files? ", NumFiles
- NumFiles = NumFiles + 5
- IF NumFiles > FilesVal THEN
- PRINT "Increase the FILES= setting in CONFIG.SYS"
- END
- END IF
-
- IF DOSVer% >= 330 THEN
- CALL MoreFiles(NumFiles)
- ELSE
- PRINT "Sorry, DOS 3.3 or later is required."
- END
- END IF
-
- FOR X = 1 TO NumFiles
- OPEN "FTEST" + LTRIM$(STR$(X)) FOR RANDOM AS #X
- NEXT
- CLOSE
- KILL "FTEST*."
- END
-
- FUNCTION DOSVer% STATIC
- InRegs.AX = &H3000
- CALL Interrupt(&H21, InRegs, OutRegs)
- Major = OutRegs.AX AND &HFF
- Minor = OutRegs.AX \ &H100
- DOSVer% = Minor + 100 * Major
- END FUNCTION
-
- SUB MoreFiles (NumFiles) STATIC
- InRegs.AX = &H6700
- InRegs.BX = NumFiles
- CALL Interrupt(&H21, InRegs, OutRegs)
- END SUB
-
- As with the TEXTFIND program, this also uses CALL Interrupt and therefore
- requires QB.LIB and QB.QLB to compile or run in the QuickBASIC environment
- respectively. Even though DOS allows you to increase the number of files
- past the default 15, an appropriate FILES= statement must also be added to
- the PC's CONFIG.SYS file. In fact, the FILES= value must be five greater
- than the desired number of files, because DOS reserves the first five for
- itself. The reserved files [devices] are PRN, AUX, STDIN, STDOUT, and
- STDERR. PRN is of course the printer connected to LPT1, AUX is the first
- COM port, and the remaining devices are all part of the CON console device.
- In order to find the CONFIG.SYS file this program uses the ENVIRON$
- function to retrieve the current COMSPEC= setting. Unless someone has
- changed it on purpose, the COMSPEC environment variable holds the drive and
- path from which the PC was booted, and the file name "COMMAND.COM". Then
- each line in CONFIG.SYS is examined for the string "FILES=", to ensure that
- enough file entries were specified. This program makes only a minimal
- attempt to identify the "FILES=" string, so if there are extra spaces such
- as "FILES = 30" the test will fail.
- Next the DOS version is tested to ensure that it is version 3.3 or
- later. The DOSVer function is designed to return the DOS version as an
- integer value 100 times higher than the actual version number. That is,
- DOS 2.14 is returned as 214, and DOS 3.30 is instead 330. This eliminates
- the floating point math required to return a value such as 2.14 or 3.3,
- resulting in less code and faster operation.
- Assuming the FILES= setting is sufficiently high and the DOS version
- is at least 3.30, the program creates and then deletes the specified number
- of files just to show it worked. You should be aware that the BASIC editor
- must also open files when it saves your program. I mention this because it
- is possible to be experimenting with a program such as this one, and not be
- able to save your work because the maximum allowable number of files are
- already open. In that case BASIC issues a "Too many files" error message,
- and refuses to let you save. The solution is to press F6 to go to the
- Immediate window, and then type CLOSE.
- A similar situation happens when you try to shell to DOS from the
- BASIC editor, because shelling requires BASIC to open COMMAND.COM. But an
- unsuccessful shell results in an "Illegal function call" error. That
- message is particularly exasperating when BASIC's SHELL fails, because the
- failure is usually caused by insufficient memory or because COMMAND.COM
- cannot be located. Why Microsoft chose to return "Illegal function call"
- rather than "Out of memory", "File not found", or "Too many files" is
- anyone's guess.
- Another important BASIC limitation that can be overcome only with
- clever trickery is its inability to "map" multiple variables to the same
- memory address. This is an important feature of the C language, and it has
- some important applications. For example, if you are frequently accessing
- a group of characters in the middle of a string, you must use MID$ each
- time you assign or retrieve them. Unfortunately, MID$ is very slow because
- it always extracts a copy of the specified characters, even if you are
- merely printing them. If only BASIC would let you create a new string that
- always referred to that group of characters in the first string, the access
- speed could be greatly improved.
- The FIELD statement lets you do exactly this, and each time a new
- FIELD statement is encountered the same area of memory is referred to. The
- short example below shows the tremendous speed improvement possible only
- when two variables can occupy the same address. An additional trick used
- here is to open the DOS reserved "\DEV\NUL" device. This eliminates any
- disk access, and avoids also having to create an empty file just to
- implement the FIELD statement.
-
- DEFINT A-Z
-
- OPEN "\DEV\NUL" FOR RANDOM AS #1 LEN = 30
- FIELD #1, 10 AS First$, 10 AS Middle$, 10 AS Last$
- FIELD #1, 30 AS Entire$
- LSET Entire$ = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234"
- Start! = TIMER
- FOR X = 1 TO 20000
- Temp = ASC(Middle$)
- NEXT
- Done! = TIMER
- PRINT USING "##.### seconds for FIELD"; Done! - Start!
- CLOSE
-
- Entire$ = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234"
- Start! = TIMER
- FOR X = 1 TO 20000
- Temp = ASC(MID$(Entire$, 10, 10))
- NEXT
- Done! = TIMER
- PRINT USING "##.### seconds for MID$"; Done! - Start!
-
- As you can see, accessing Middle$ as defined in the FIELD statement is more
- than three times faster than accessing the middle portion of Entire$ using
- MID$. There are no doubt other situations where it is useful to treat the
- same area of memory as different variables, perhaps to provide different
- views [such as numeric and string] of the same data. We can only hope that
- Microsoft will see fit to add this important capability to a future version
- of BASIC. [PowerBASIC offers this feature via the UNION command.]
- The NUL device has other important applications in conjunction with
- FIELD. One common programming problem that comes up frequently is being
- able to format numbers to a controlled number of decimal places. Although
- BASIC's PRINT USING will format a number and write it to the screen, there
- is no way to actually access the formatted value. It is possible to have
- PRINT USING write the value on the screen--perhaps in the upper left corner
- with a color setting of black on black--and then read it character by
- character with SCREEN. But that method is clunky at best, and also very
- slow.
- The short program below uses PRINT USING # to write to a fielded
- buffer, and then LINE INPUT # to read the number back from the buffer.
-
-
- Value# = 123.45678#
-
- OPEN "\DEV\NUL" FOR RANDOM AS #1 LEN = 15
- FIELD #1, 15 AS Format$
- PRINT #1, USING "####.##"; Value#
- LINE INPUT #1, Fmt$
-
- PRINT " Value:"; Value#
- PRINT "Formatted:"; Fmt$
-
-
- Notice that the field buffer must be long enough to receive the entire
- formatted string, including the carriage return and line feed that BASIC
- sends as part of the PRINT # statement. This technique opens up many
- exciting possibilities, especially when used in conjunction with PRINT #
- USING's other extensive formatting options.
- [PDS includes the FORMAT$ function externally in Quick and regular
- link libraries, and VB/DOS goes a step further by adding FORMAT$ to the
- language. But FORMAT$ offers only a subset of what PRINT USING can do.]
-
-
- ADVANCED DEVICE TECHNIQUES
- ==========================
-
- As many tricks as there are for reading and writing files, there are just
- as many for accessing devices. Many devices such as printers and modems
- are so much slower than BASIC that the techniques for sending large amounts
- of data in one operation are not needed or useful. But these devices offer
- a whole new set of problems that just beg for clever programming solutions.
- With that in mind, let's continue this tour and examine some of the less
- obvious aspects of BASIC's device handling capabilities.
-
-
- THE PRINTER DEVICE
-
- All modern printers accept special control codes to enable and disable
- underlining, boldfacing, italics, and sometimes even font changes. Many
- printers honor the standard Epson/IBM control codes, and some recognize
- additional codes to control unique features available only with that brand
- or model. However, it is possible to print underline and boldface text
- with most printers, without regard to the particular model. The examples
- shown below require that you open the printer as a device using "LPT1:BIN".
- If you are using LPT2, of course, then you will open "LPT2:BIN" instead.
- As I mentioned earlier, the BIN option tells BASIC not to interfere with
- any control codes you send, and also not to add automatic line wrapping.
- Most programmers assume that every carriage return is always
- accompanied by a corresponding line feed, and indeed, that is almost always
- the case. Even if you print a CHR$(13) carriage return followed by a
- semicolon, BASIC steps in and appends a line feed for you. But these are
- separate characters, and each can be used separately to control a printer.
- The example below prints a short string and a carriage return *without* a
- line feed, and then prints a series of underlines beneath the string.
-
-
- OPEN "LPT1:BIN" FOR OUTPUT AS #1
- PRINT #1, "BASIC Techniques and Utilities"; CHR$(13);
- PRINT #1, " __________"
- CLOSE
-
-
- Similarly, you can also simulate boldfacing by printing the same string at
- the same place on the paper two or three times. While this won't work with
- a laser printer, it is very effective on dot matrix printers. Of course,
- if you do know the correct control codes for the printer, then those can be
- sent directly. Be sure, however, to always include a trailing semicolon as
- part of the print statement, to avoid also sending an unwanted return and
- line feed. For example, to advance a printer to the start of the next page
- you would use either PRINT #1, CHR$(12); or LPRINT CHR$(12);. In this
- case, a normal LPRINT will work because you are not sending a CHR$(13) or
- CHR$(10).
- Most printers also accept a CHR$(8) to indicate a backspace, which may
- simplify underlining in some cases. That is, instead of printing a
- CHR$(13) to go the start of the line, you would print the string, and
- simply back up the print head the appropriate number of columns. BASIC's
- STRING$ function is ideal for this, using LPRINT STRING$(Count, 8); to send
- Count backspace characters to the printer.
- You can also send a complete font file to a printer with the CopyFile
- program shown earlier. Simply give the font file's name as the source, and
- the string "LPT1:BIN" as the destination.
-
-
- THE SCREEN DEVICE
-
- As with printers, there are a number of ways to manipulate the display
- screen by printing special control characters. Where a CHR$(12) can be
- used to advance the printer to the top of the next page, this same
- character will clear the screen and place the cursor at the upper left
- corner. Printing a CHR$(11) will home the cursor only, and printing a
- CHR$(7) beeps the speaker.
- Another useful screen control character is CHR$(9), which advances to
- the next tab stop. Tab stops are located at every eighth column, with the
- first at column 9, the second at column 17, and so forth. As with a
- printer that has not been opened using the BIN option, printing either a
- CHR$(10) or a CHR$(13)--even with a semicolon--always sends the cursor to
- the beginning of the next line. There is unfortunately no way to separate
- the actions of a carriage return and line feed.
- The last four control characters that are useful with the screen are
- CHR$(28), CHR$(29), CHR$(30), and CHR$(31). These move the cursor forward,
- backward, up a line (if possible) and down a line (if possible). Although
- LOCATE can be used to move the cursor, these commands allow you to do it
- relative to the current location. To do the same with LOCATE would require
- code like this: IF POS(0) > 1 THEN LOCATE , POS(0) - 1. Obviously, the
- control characters will result in less generated code, because they avoid
- the IF test and repeated calls to BASIC's POS(0) function.
- BASIC PDS includes a series of stub files named TSCNIOxx.OBJ that
- eliminate support for all graphics statements, and also ignore the control
- characters listed above. Because each character must be tested
- individually by BASIC as it looks for these control codes, using these stub
- files will increase the speed of your program's display output.
- All versions of Microsoft BASIC have always included the WIDTH
- statement for controlling the number of columns on the screen. With the
- introduction of QuickBASIC 3.0, SCREEN was expanded to also allow setting
- the number of rows on EGA and VGA monitors. The statement WIDTH , 43 puts
- the screen into the 43-line text mode, and may be used with an EGA or VGA
- display. WIDTH , 50 is valid for VGA monitors only, and as you can
- imagine, it switches the display to the 50-line text mode.
- In many cases it is necessary to know if the display screen is color
- or monochrome, and also if it is capable of supporting the EGA or VGA
- graphics modes. The simplest way to detect a color monitor is to look at
- the display adapter's port address in low memory. The short code fragment
- below shows how this is done.
-
-
- DEF SEG = 0
- IF PEEK(&H463) = &HB4 THEN
- '---- it's a monochrome monitor
- ELSE
- '---- it's a color monitor
- END IF
-
-
- This information is important if you plan to BLOAD a screen image directly
- into video memory. If the display adapter is reported as monochrome, then
- you would use DEF SEG to set the segment to &HB000. A color monitor in
- text mode instead uses segment &HB800. Knowing if a monitor has color
- capabilities also helps you to choose appropriate color values, and tells
- you if it can support graphics. But you will need to know which video
- modes the display adapter is capable of.
- Detecting an EGA or VGA is more complex than merely distinguishing
- between monochrome and color, because it requires calling a video interrupt
- service routine located on the display adapter card. A Hercules monitor is
- also difficult to detect, because that requires a timing loop to see if the
- Hercules video status port changes. All of this is taken into account in
- the example and function that follows.
-
- DEFINT A-Z
-
- DECLARE SUB Interrupt (IntNum, InRegs AS ANY, OutRegs AS ANY)
- DECLARE FUNCTION Monitor% (Segment)
-
- TYPE RegType
- AX AS INTEGER
- BX AS INTEGER
- CX AS INTEGER
- DX AS INTEGER
- BP AS INTEGER
- SI AS INTEGER
- DI AS INTEGER
- Flags AS INTEGER
- END TYPE
- DIM SHARED InRegs AS RegType, OutRegs AS RegType
-
- SELECT CASE Monitor%(Segment)
- CASE 1
- PRINT "Monochrome";
- CASE 2
- PRINT "Hercules";
- CASE 3
- PRINT "CGA";
- CASE 4
- PRINT "EGA";
- CASE 5
- PRINT "VGA";
- CASE ELSE
- PRINT "Unknown";
- END SELECT
- PRINT " monitor at segment &H"; HEX$(Segment)
-
- FUNCTION Monitor% (Segment) STATIC
-
- DEF SEG = 0 'first see if it's color or mono
- Segment = &HB800 'assume color
-
- IF PEEK(&H463) = &HB4 THEN 'it's monochrome
-
- Segment = &HB000 'assign the monochrome segment
- Status = INP(&H3BA) 'get the current video status
- FOR X = 1 TO 30000 'test for a Hercules 30000 times
- IF INP(&H3BA) <> Status THEN
- Monitor% = 2 'the port changed, it's a Herc
- EXIT FUNCTION 'all done
- END IF
- NEXT
- Monitor% = 1 'it's a plain monochrome
-
- ELSE 'it's some sort of color monitor
-
- InRegs.AX = &H1A00 'first test for VGA
- CALL Interrupt(&H10, InRegs, OutRegs)
- IF (OutRegs.AX AND &HFF) = &H1A THEN
- Monitor% = 5 'it's a VGA
- EXIT FUNCTION 'all done
- END IF
-
- InRegs.AX = &H1200 'now test for EGA
- InRegs.BX = &H10
- CALL Interrupt(&H10, InRegs, OutRegs)
- IF (OutRegs.BX AND &HFF) = &H10 THEN
- Monitor% = 3 'if BL is still &H10 it's a CGA
- ELSE
- Monitor% = 4 'otherwise it's an EGA
- END IF
-
- END IF
-
- END FUNCTION
-
- The Monitor function returns both the type of monitor that is active, as
- well as the video segment that is used when displaying text. EGA and VGA
- displays use segment &HA000 for graphics, which is a different issue
- altogether. Monitor is particularly valuable when you need to know what
- SCREEN modes a given display adapter can support. The *only* alternative
- is to use ON ERROR and try each possible SCREEN value in a loop starting
- from the highest resolution. When SCREEN finally reaches a low enough
- value to succeed, then you know what modes are legal. Since BASIC knows
- the type of monitor installed, it seems inconceivable to me that this
- information is not made available to your program. [PowerBASIC uses an
- internal variable to hold the display type, and that variable is available
- to the programmer.]
- Notice that the Registers TYPE variable is dimensioned in the example
- portion of this program, and not in the Monitor function itself. Each time
- a TYPE or fixed-length string variable is dimensioned in a STATIC
- subprogram or function, new memory is allocated permanently to hold it. In
- this short program the Registers TYPE variable is used only once. But in a
- real program that incorporates many of the routines from this chapter,
- memory can be saved by using DIM SHARED in the main program. Then, each
- subroutine can use the same variable for its own use.
- Once you know the type of monitor, you will also know what color
- combinations are valid and readable. A color monitor can of course use any
- combination of foreground and background colors, but a monochrome is
- limited to the choices shown in Table 6-3. Combinations not listed will
- result in text that is unreadable on a many monochrome monitors.
-
-
- Color as Displayed COLOR Values
- ──────────────────────────────── ────────────
- White on Black COLOR 7, 0
- Bright White on Black COLOR 15, 0
- Black on White COLOR 0, 7
- White Underlined on Black COLOR 1, 0
- Bright White Underlined on Black COLOR 9, 0
-
- Table 6-3: Valid Color Combinations For Use With a Monochrome Monitor.
-
-
- It is important to point out that some computers employ a CGA display
- adapter connected to a monochrome monitor. For example, the original
- Compaq portable PC used this arrangement. Many laptop computers also have
- a monochrome display connected to a CGA, EGA, or VGA adapter. Since it is
- impossible for a program to look beyond the adapter hardware through to the
- monitor itself, you will need to provide a way for users with that kind of
- hardware to alert your program.
- The BASIC editor recognizes a /b command line switch to indicate black
- and white operation, and I suggest that you do something similar. Indeed,
- many commercial programs offer a way for the user to indicate that color
- operation is not available or desired.
- The last video-related issue I want to cover is saving and loading
- text and graphics images. As you probably know, the memory organization of
- a display adapter when it is in one of the graphics modes is very different
- than when it is in text mode. In the text mode, each character and its
- corresponding color byte are stored in contiguous memory locations in the
- appropriate video segment. All of the color text modes store the
- characters and their colors at segment &HB800, while monochrome displays
- use segment &HB000.
- The character in the upper left corner of the screen is at address 0
- in the video segment, and its corresponding color is at address 1. The
- character currently at screen location (1, 2) is stored at address 2, and
- its color is at address 3, and so forth. The brief program fragment below
- illustrates this visually by using POKE to write a string of characters and
- colors directly to display memory.
-
- DEFINT A-Z
-
- CLS
- LOCATE 20
- PRINT "Keep pressing a key to continue"
-
- DEF SEG = 0
- IF PEEK(&H463) = &HB4 THEN
- DEF SEG = &HB000
- ELSE
- DEF SEG = &HB800
- END IF
-
- Test$ = "Hello!"
- Colr = 9 'bright blue or underlined
-
- FOR X = 1 TO LEN(Test$) 'walk through the string
- Char = ASC(MID$(Test$, X, 1)) 'get this character
- POKE Address, Char 'poke it to display memory
- WHILE LEN(INKEY$) = 0: WEND 'pause for a keypress
- POKE Address + 1, Colr 'now poke the color
- Address = Address + 2 'bump to the next address
- WHILE LEN(INKEY$) = 0: WEND 'pause for a keypress
- NEXT
- END
-
- The initial CLS command stores blank spaces and the current BASIC color
- settings in every memory address pair. Assuming you have not changed the
- color previously, a character value of 32 is stored by CLS into every even
- address, and a color value of 7 in every odd one. Once the correct video
- segment is known and assigned using DEF SEG, a simple loop pokes each
- character in the string to the display starting at address 0. (Since
- Address was never assigned initially, it holds a value of zero.)
- Saving and loading graphics images is of necessity somewhat more
- complex, because you need to know not only the appropriate segment from
- which to save, but also how many bytes. The example program below creates
- a simple graphic image in CGA screen mode 1, saves the image, and then
- after clearing the screen loads it again.
-
- DEFINT A-Z
- SCREEN 1
-
- DEF SEG = 0
- PageSize = PEEK(&H44C) + 256 * PEEK(&H44D)
-
- FOR X = 1 TO 10
- CIRCLE (140, 95), X * 10, 2
- NEXT
-
- DEF SEG = &HB800
- BSAVE "CIRCLES.CGA", 0, PageSize
- PRINT "The screen was just saved, press a key."
- WHILE LEN(INKEY$) = 0: WEND
-
- CLS
- PRINT "Now press a key to load the screen."
- WHILE LEN(INKEY$) = 0: WEND
- BLOAD "CIRCLES.CGA", 0
-
- Notice the use of PEEK to retrieve the current video page size at addresses
- &H44C and &H44D. This is a handy value that the BIOS maintains in low
- memory, and it tells you how many bytes are occupied by the screen whatever
- its current mode. In truth, this value is often slightly higher than the
- actual screen dimensions would indicate, since it is rounded up to the next
- even video page boundary. For example, the 320 by 200 screen mode used
- here occupies 16000 bytes of display memory, yet the page size is reported
- as 16384. But this value is needed to calculate the appropriate address
- when saving video pages other than page 0. That is, page 0 begins at
- address 0 at segment &HB800, and page 1 begins at address 16384.
- Note that many early CGA video adapters contain only 16K of memory,
- and thus do not support multiple screen pages. Also note that there is a
- small quirk in Hercules adapters that causes the page size to always be
- reported as 16384, even when the screen is in text mode. I have found this
- word to be unreliable in the EGA and VGA graphics mode.
- Although you might think that the pixels on a CGA graphics screen
- occupy contiguous memory addresses, they do not. Although each horizontal
- line is in fact contiguous, the lines are interlaced. Running the short
- program below shows how the first half of the video addresses contains the
- even rows (starting at row zero), and the second half holds the odd rows.
-
-
- SCREEN 1
- DEF SEG = &HB800
- FOR X = 1 TO 15999
- POKE X, 255
- NEXT
-
-
- EGA and VGA displays add yet another level of complexity, because they use
- a separate video memory *plane* to store each color. Four planes are used
- for EGA and VGA, with one each to hold the red, blue, green, and intensity
- (brightness) information. Each plane is identified using the same segment
- and address, and OUT instructions are needed to select which is to be made
- currently active. This is called *bank switching*, because multiple,
- parallel banks of memory are switched in and out of the CPU's address
- space. When the red plane is active, reading and writing those memory
- locations affects only the red information on the screen. And when the
- intensity plane is made active, only the brightness for a given pixel on
- the screen is considered.
- Bank switching is needed to accommodate the enormous amount of
- information that an EGA or VGA screen can contain. For example, in EGA
- screen mode 9, each plane occupies 28,000 bytes, for a total of 112,000
- bytes of memory. This far exceeds the amount of memory the designers of
- the original IBM PC anticipated would ever be needed for display purposes.
- There simply aren't enough addresses available in the PC for video use.
- Therefore, the only way to deal with that much information is to provide
- additional memory in the EGA and VGA adapters themselves. When a program
- needs to access a memory plane, it must do that one bank at a time so it
- can be read or written by the CPU.
- The program below expands slightly on the earlier example, and shows
- how to save and load EGA and VGA screens by manipulating each video plane
- individually.
-
- DEFINT A-Z
- DECLARE SUB EgaBSave (FileName$)
- DECLARE SUB EgaBLoad (FileName$)
-
- SCREEN 9
- LOCATE 25, 1
- PRINT "Press a key to stop, and save the screen.";
-
- '---- clever video effects by Brian Giedt
- WHILE LEN(INKEY$) = 0
- T = (T MOD 150) + 1
- C = (C + 1) MOD 16
- LINE (T, T)-(300 - T, 300 - T), C, B
- LINE (300 + T, T)-(600 - T, 300 - T), C, B
- WEND
-
- LOCATE 25, 1
- PRINT "Thank You!"; TAB(75);
- CALL EgaBSave("SCREEN9")
-
- CLS
- LOCATE 25, 1
- PRINT "Now press a key to read the screen.";
- WHILE LEN(INKEY$) = 0: WEND
- LOCATE 25, 1
- PRINT TAB(75);
-
- CALL EgaBLoad("SCREEN9")
-
- SUB EgaBLoad (FileName$) STATIC
-
- 'UnREM the KILL statements to erase the saved images after they
- ' have been loaded.
-
- DEF SEG = &HA000
- OUT &H3C4, 2: OUT &H3C5, 1
- BLOAD FileName$ + ".BLU", 0
- 'KILL FileName$ + ".BLU"
-
- OUT &H3C4, 2: OUT &H3C5, 2
- BLOAD FileName$ + ".GRN", 0
- 'KILL FileName$ + ".GRN"
-
- OUT &H3C4, 2: OUT &H3C5, 4
- BLOAD FileName$ + ".RED", 0
- 'KILL FileName$ + ".RED"
-
- OUT &H3C4, 2: OUT &H3C5, 8
- BLOAD FileName$ + ".INT", 0
- 'KILL FileName$ + ".INT"
- OUT &H3C4, 2: OUT &H3C5, 15
-
- END SUB
-
- SUB EgaBSave (FileName$) STATIC
-
- DEF SEG = &HA000
- Size& = 28000 'use 38400 for VGA SCREEN 12
-
- OUT &H3CE, 4: OUT &H3CF, 0
- BSAVE FileName$ + ".BLU", 0, Size&
-
- OUT &H3CE, 4: OUT &H3CF, 1
- BSAVE FileName$ + ".GRN", 0, Size&
-
- OUT &H3CE, 4: OUT &H3CF, 2
- BSAVE FileName$ + ".RED", 0, Size&
-
- OUT &H3CE, 4: OUT &H3CF, 3
- BSAVE FileName$ + ".INT", 0, Size&
-
- OUT &H3CE, 4: OUT &H3CF, 0
-
- END SUB
-
- In the EGABLoad and EGABSave subroutines, two OUT statements are actually
- needed to switch planes. The first gets the EGA adapter's attention, to
- tell it that a subsequent byte is coming. That second value then indicates
- which memory plane to make currently available.
-
-
- THE KEYBOARD DEVICE
-
- The last device to consider is the keyboard. BASIC offers several commands
- and functions for accessing the keyboard, and these are INPUT, LINE INPUT,
- INPUT$, and INKEY$. Further, the "KYBD:" device may be opened as a file,
- and read using the file versions of the first three statements.
- As with the file versions, INPUT reads numbers or text up to a
- terminating comma or Enter character. LINE INPUT is for strings only, and
- it ignores commas and requires Enter to be pressed to indicate the end of
- the line. INPUT$ waits until the specified number of characters have been
- typed before returning, without regard to what characters are entered.
- INKEY$ returns to the program immediately, even if no key was pressed.
- Few serious programmers ever use INPUT or LINE INPUT for accepting
- entire lines of text, unless the program is very primitive or will be used
- only occasionally. The major problem with INPUT and LINE INPUT is that
- there's no way to control how many characters the operator enters. Once
- you use INPUT or LINE INPUT, you have lost control entirely until the user
- presses Enter. Worse, when INPUT is used to enter numeric variables, an
- erroneous entry causes BASIC to print its infamous "Redo from start"
- message. Either of these can spoil the appearance of a carefully designed
- data entry screen.
- Therefore, the only reasonable way to accept user input is to use
- INKEY$ to read the keys one by one, and act on them individually. If a
- character key is pressed, the cursor is advanced and the character is added
- to the string. If the back space key is detected, the cursor is moved to
- the left one column and the current character is erased. A series of IF or
- CASE statements is often used for this purpose, to handle every key that
- needs to be recognized.
- The Editor input routine below provides exactly this service, and also
- allows tells you how editing was terminated. Besides being able to control
- the size of the input editing field, Editor also handles the Insert and
- Delete keys, and recognizes Home and End to jump the beginning and end of
- the field. A single COLOR statements lets you control the editing field
- color independently of the rest of the screen. The first portion of the
- code shows how Editor is set up and called.
-
- DEFINT A-Z
- DECLARE SUB Editor (Text$, LeftCol, RightCol, KeyCode)
-
- COLOR 7, 1 'clear to white on blue
- CLS
-
- Text$ = "This is a test" 'make some sample text
- LeftCol = 20 'set the left column
- RightCol = 60 'and the right column
- LOCATE 10 'set the line number
- COLOR 0, 7 'set the field color
-
- DO 'edit until Enter or Esc
- CALL Editor(Text$, LeftCol, RightCol, KeyCode)
- LOOP UNTIL KeyCode = 13 OR KeyCode = 27
-
- SUB Editor (Text$, LeftCol, RightCol, KeyCode)
-
- '----- Find the cursor's size.
- DEF SEG = 0
- IF PEEK(&H463) = &HB4 THEN
- CsrSize = 12 'mono uses 13 scan lines
- ELSE
- CsrSize = 7 'color uses 8
- END IF
-
- '----- Work with a temporary copy.
- Edit$ = SPACE$(RightCol - LeftCol + 1)
- LSET Edit$ = Text$
-
- '----- See where to begin editing and print the string.
- TxtPos = POS(0) - LeftCol + 1
- IF TxtPos < 1 THEN TxtPos = 1
- IF TxtPos > LEN(Edit$) THEN TxtPos = LEN(Edit$)
-
- LOCATE , LeftCol
- PRINT Edit$;
-
- '----- This is the main loop for handling key presses.
- DO
- LOCATE , LeftCol + TxtPos - 1, 1
-
- DO
- Ky$ = INKEY$
- LOOP UNTIL LEN(Ky$) 'wait for a keypress
-
- IF LEN(Ky$) = 1 THEN 'create a key code
- KeyCode = ASC(Ky$) 'regular character key
- ELSE 'extended key
- KeyCode = -ASC(RIGHT$(Ky$, 1))
- END IF
-
- '----- Branch according to the key pressed.
- SELECT CASE KeyCode
-
- '----- Backspace: decrement the pointer and the
- ' cursor, but ignore if in the first column.
- CASE 8
- TxtPos = TxtPos - 1
- LOCATE , LeftCol + TxtPos - 1, 0
- IF TxtPos > 0 THEN
- IF Insert THEN
- MID$(Edit$, TxtPos) = MID$(Edit$, TxtPos + 1) + " "
- ELSE
- MID$(Edit$, TxtPos) = " "
- END IF
- PRINT MID$(Edit$, TxtPos);
- END IF
-
- '----- Enter or Escape: this block is optional in
- ' case you want to handle these separately.
- CASE 13, 27
- EXIT DO 'exit the subprogram
-
- '----- Letter keys: turn off the cursor to hide
- ' the printing, handle Insert mode as needed.
- CASE 32 TO 254
- LOCATE , , 0
- IF Insert THEN 'expand the string
- MID$(Edit$, TxtPos) = Ky$ + MID$(Edit$, TxtPos)
- PRINT MID$(Edit$, TxtPos);
- ELSE 'else insert character
- MID$(Edit$, TxtPos) = Ky$
- PRINT Ky$;
- END IF
- TxtPos = TxtPos + 1 'update position counter
-
- '----- Left arrow: decrement the position counter.
- CASE -75
- TxtPos = TxtPos - 1
-
- '----- Right arrow: increment position counter.
- CASE -77
- TxtPos = TxtPos + 1
-
- '----- Home: jump to the first character position.
- CASE -71
- TxtPos = 1
-
- '----- End: search for the last non-blank, and
- ' make that the current editing position.
- CASE -79
- FOR N = LEN(Edit$) TO 1 STEP -1
- IF MID$(Edit$, N, 1) <> " " THEN EXIT FOR
- NEXT
- TxtPos = N + 1
- IF TxtPos > LEN(Edit$) THEN TxtPos = LEN(Edit$)
-
- '----- Insert key: toggle the Insert state and
- ' adjust the cursor size.
- CASE -82
- Insert = NOT Insert
- IF Insert THEN
- LOCATE , , , CsrSize \ 2, CsrSize
- ELSE
- LOCATE , , , CsrSize - 1, CsrSize
- END IF
-
- '----- Delete: delete the current character and
- ' reprint what remains in the string.
- CASE -83
- MID$(Edit$, TxtPos) = MID$(Edit$, TxtPos + 1) + " "
- LOCATE , , 0
- PRINT MID$(Edit$, TxtPos);
-
- '---- All other keys: exit the subprogram
- CASE ELSE
- EXIT DO
- END SELECT
-
- '----- Loop until the cursor moves out of the field.
- LOOP UNTIL TxtPos < 1 OR TxtPos > LEN(Edit$)
-
- Text$ = RTRIM$(Edit$) 'trim the text
-
- END SUB
-
- Most of the details in this subprogram do not require much explanation, and
- the code should prove simple enough to be self-documenting. However, I
- would like to discuss INKEY$ as it is used here.
- Each time INKEY$ is used it examines the keyboard buffer, to see if a
- key is pending. If not, a null string is returned. If a key is present in
- the buffer INKEY$ removes it, and returns either a 1- or 2-byte string,
- depending on what type of key it is. Normal character keys and control
- keys (entered by pressing the Ctrl key in conjunction with a regular key)
- are returned as a 1-byte string. Some special keys such as Enter and
- Escape are also returned as a 1-byte string, because they are in fact
- control keys. For example, Enter is the same as Ctrl-M, and Escape is
- identical to the Ctrl-[ key.
- The IBM PC offers additional keys and key combinations that are not
- defined by the ASCII standard, and these are returned as a 2-byte string so
- your program can identify them. Extended keys include the function keys,
- Home and End and the other cursor control keys, and Alt key combinations.
- When an extended key is returned the first character is always CHR$(0), and
- the second character corresponds to the extended key's code using a method
- defined by IBM. Therefore, you can determine if a key is extended either
- by looking for a length of two, or by examining the first character to see
- if it is a CHR$(0) zero byte.
- There are three ways to accomplish this, and which is best depends on
- the compiler you are using. The brief program fragment below shows each
- method, and the number of bytes that are generated by both compilers.
-
-
- IF LEN(X$) = 2 THEN '17 for QB4, 7 for PDS
-
- IF ASC(X$) THEN '16 for QB4, 13 for PDS
-
- IF LEFT$(X$, 1) = CHR$(0) THEN '33 for QB4, 30 for PDS
-
-
- The references to QB 4 are valid for both QuickBASIC 4.0 and 4.5. The
- BASIC PDS byte counts reflect that compiler's improved code optimization,
- however this improvement is available only with near strings. When far
- strings are used the LEN test requires the same 13 bytes as the ASC test.
- [I'll presume that VB/DOS, with its support for only far strings, also uses
- the longer byte count.]
- As you can see, the test that uses BASIC's ASC function is slightly
- better than the one that uses LEN if you are using QuickBASIC. But if you
- have BASIC PDS the LEN test is quite a bit shorter. Comparing the first
- character in the string is much worse for either compiler, because
- individual calls must be made to BASIC's LEFT$, CHR$, and string comparison
- routines.
- Even though the length and address of a QuickBASIC string is stored in
- the string's descriptor and is easily available to the compiler, the BC
- compiler that comes with QuickBASIC still calls a LEN routine. Where the
- compiler *could* use CMP WORD PTR [DescriptorAddress], 2 to see if the
- string length is 2, it instead passes the address of the string descriptor
- on the stack, calls the LEN routine, and compares the result LEN returns.
- Fortunately, this optimization was added in BASIC PDS when near strings are
- used. Likewise, SADD when used with PDS near strings directly retrieves
- the string's address from the descriptor as well, instead of calling a
- library routine as QuickBASIC does.
- The Editor subprogram uses the LEN method to determine the type of key
- that was pressed, which is most efficient if you are using BASIC PDS.
- Because integer comparisons are faster and generate less code than the
- equivalent operation with strings, ASC is then used to obtain either the
- ASCII value of the key, or the value of the extended key code. The result
- is assigned to the variable KeyCode as either a positive number to indicate
- a regular ASCII key, or a negative value that corresponds to an extended
- key's code. This method helps to reduce the size of the subprogram, by
- eliminating string comparisons in each CASE statement.
- One important warning when using ASC is that it will generate an
- "Illegal function call" error if you pass it a null string. Therefore, in
- many cases you must include an additional test just for that:
-
-
- IF LEN(Work$) THEN
- IF ASC(Work$) THEN
- ...
- ...
- END IF
- END IF
-
-
- One solution is to create your own function--perhaps called ASCII%()--that
- does this for you. Since calling a BASIC function requires no more code
- than when BASIC calls its own routines (assuming you are using the same
- number of arguments, of course), this can also help to reduce the size of
- your programs. I like to use a return value of -1 to indicate a null
- string, as shown below.
-
-
- FUNCTION ASCII%(This$)
- IF LEN(This$) THEN
- ASCII% = ASC(This$)
- ELSE
- ASCII% = -1
- END IF
- END FUNCTION
-
-
- Now you can simply use code such as IF ASCII%(Any$) = Whatever THEN...
- confident that no error will occur and the returned value will still be
- valid.
-
-
- Redirection
-
- One clever DOS feature that many programmers are not aware of is its
- ability to redirect a program's normal input and output to a file. When a
- program is redirected, print statements go to a specified file, keyboard
- input is read from a file, or both. The actual redirection commands are
- entered by the user of your program, and your program has no idea that this
- has happened. This is really more a DOS issue than a BASIC concern, but
- it's a powerful feature and you should understand how it works.
- Redirection is useful for capturing a program's output to a disk file,
- or feeding keystrokes to a program using a predefined sequence contained in
- a file. For example, the output of the DOS DIR command can be redirected
- to a file with this command:
-
- dir *.* > anyfile
-
- Redirecting a program's input can be equally valuable. If you often format
- several diskettes at once you might create a file that contains the answer
- Y followed by an Enter character, and then run format using this:
-
- format < yesfile
-
- This way the file will provide the response to "Format another (Y/N)?".
- To redirect a program's output, start it from the DOS command line and
- place a *greater than* symbol and the output file name at the end of the
- command line:
-
- program > filename
-
- Similarly, using a *less than* sign tells DOS to replace the program's
- requests for keyboard input with the contents of the specified file, thus:
-
- program < filename
-
- You can combine both redirected input and output at the same time, and the
- order in which they are given does not matter. It is important to
- understand that redirecting a program's output to a file is similar to
- opening that file for output. That is, it is created if it didn't yet
- exist, or truncated to a length of zero if it did. However, DOS also lets
- you append to a file when redirecting output, using two symbols in a row:
-
- program >> filename
-
- Please be aware that you can hang a PC completely when redirecting a
- program's input, if the necessary characters are not present. For example,
- this would happen when redirecting a program that uses LINE INPUT from a
- file that has no terminating CHR$(13) Enter character. Even pressing Ctrl-
- Break will have no effect, and your only recourse is to reboot, or close
- down the DOS session if you are using Windows.
-
-
- SUMMARY
- =======
-
- This chapter has presented an enormous amount of information about both
- files and devices in BASIC. If began with a brief overview of how DOS
- allocates disk storage using sectors and clusters, and continued with an
- explanation of file buffers. By understanding the relationship between
- BASIC's own buffers and their impact on string memory, you gain greater
- control over your program's speed and memory requirements.
- This then led to a comparison of files and devices, and showed how
- they can be controlled by similar BASIC statements. In particular, you
- learned how the same block of code can be used to send information to
- either, simplifying the design of reports and other programming output
- chores.
- The section that described file access methods compared all of the
- available options, and explained when each is appropriate and why. You
- learned that all DOS files are really just a continuous stream of binary
- data, and the various OPEN methods merely let you indicate to BASIC how
- that data is to be handled.
- You also learned that the best way to improve a program's file access
- speed is to read and write data in large blocks. Several complete
- subprograms and functions were shown to illustrate this technique, and most
- are general enough to be useful when included within your own programs.
- Numerous tips and tricks were presented to determine the type of
- display adapter installed, run .COM programs and .BAT files, obtain
- formatted numbers by combining PRINT USING # with FIELD and INPUT #, and
- many more. You were also introduced to the possibility of calling BASIC's
- internal library routines as a way to circumvent many otherwise arbitrary
- limitations in the language.
- Finally, video memory organization was revealed for all of the popular
- screen modes, and example programs were provided to show how they may be
- saved and loaded.
- In the next chapter I will continue this discussion of files with
- detailed explanations of writing database programs. Chapter 7 will also
- describe how to write programs that operate on a network, as well as how to
- access data that uses the popular dBASE file format.